Tuesday, 6 August 2013

Can't upload to BigQuery from Perl

Can't upload to BigQuery from Perl

I am trying to perform an upload to BigQuery from Perl with a sample
schema and some sample data. I ran into dead ends following the
documentation they provide, and so now I'm trying to mimic what the bq
command line client successfully does.
I am tracing what bq does by adding a debug print (method, uri, headers,
body) to the request method in httplib2. I am tracing what my Perl library
is doing by doing a Dumper on the response, which also includes the
_request that I sent. The pattern in bq is that they POST to an upload
URL, then get back a location to PUT data to. The corresponding job is
monitored with a series of GET requests, and finally they respond.
In Perl my POST succeeds, and my GET fails with Invalid Upload Request
(but no hint why it is invalid). I am trying to figure out what difference
between the two could explain my failure. But I can't find it.
Here are (with the access_token, IP addresses and project_id elided) the
traces that I get.
For the POST the information from Python is:
(
u'POST',
u'https://www.googleapis.com/upload/bigquery/v2/projects/<project
ID>/jobs?uploadType=resumable&alt=json',
{
'content-length': '442',
'accept-encoding': 'gzip, deflate',
'accept': 'application/json',
'user-agent': u'bq/2.0 google-api-python-client/1.0',
'X-Upload-Content-Length': '84',
'X-Upload-Content-Type': 'application/octet-stream',
'content-type': 'application/json',
'Authorization': u'Bearer <access token>'
},
'{"configuration": {"load": {"sourceFormat": "NEWLINE_DELIMITED_JSON",
"destinationTable": {"projectId": "<project id>", "tableId":
"demo_api", "datasetId": "tmp_bt"}, "maxBadRecords": 0, "schema":
{"fields": [{"type": "STRING", "mode": "required", "name":
"demo_string"}, {"type": "INTEGER", "mode": "required", "name":
"demo_integer"}]}}}, "jobReference": {"projectId": "<project id>",
"jobId": "bqjob_r139e633b7e522cf7_0000014031d9fb49_1"}}'
)
The corresponding Perl gets an apparently successful response object (in
which you can see the _request) of:
$VAR1 = bless( {
'_protocol' => 'HTTP/1.1',
'_content' => '',
'_rc' => '200',
'_headers' => bless( {
'connection' => 'close',
'client-response-num' => 1,
'location' =>
'https://www.googleapis.com/upload/bigquery/v2/projects/<project
id>/jobs?uploadType=resumable&upload_id=AEnB2Ur0mdwmZpMot6ftkgj1IkqK0f7oPbZrXWQekUDHK_E2o2HKznJO6DK2xPYCB-nhUGrMrEJJ7z1Tz9Crnka9e5EYGP1lWQ',
'date' => 'Tue, 06 Aug 2013 20:46:05 GMT',
'client-ssl-cert-issuer' => '/C=US/O=Google Inc/CN=Google Internet
Authority',
'client-ssl-cipher' => 'RC4-SHA',
'client-peer' => '<some ip>:443',
'content-length' => '0',
'client-date' => 'Tue, 06 Aug 2013 20:46:05 GMT',
'content-type' => 'text/html; charset=UTF-8',
'client-ssl-cert-subject' => '/C=US/ST=California/L=Mountain
View/O=Google Inc/CN=*.googleapis.com',
'server' => 'HTTP Upload Server Built on Jul 24 2013 17:20:01
(1374711601)',
'client-ssl-socket-class' => 'IO::Socket::SSL'
}, 'HTTP::Headers' ),
'_msg' => 'OK',
'_request' => bless( {
'_content' =>
'{"configuration":{"load":{"maxBadRecords":0,"destinationTable":{"datasetId":"tmp_bt","tableId":"perl","projectId":<project
id>},"sourceFormat":"NEWLINE_DELIMITED_JSON","schema":{"fields":[{"mode":"required","name":"demo_string","type":"STRING"},{"mode":"required","name":"demo_integer","type":"INTEGER"}]}}},"jobReference":{"projectId":<project
id>,"jobId":"perlapi_1375821964"}}',
'_uri' => bless( do{\(my $o =
'https://www.googleapis.com/upload/bigquery/v2/projects/<project
id>/jobs?uploadType=resumable')}, 'URI::https' ),
'_headers' => bless( {
'user-agent' => 'libwww-perl/6.05',
'content-type' => 'application/json',
'accept' => 'application/json',
':X-Upload-Content-Type' => 'application/octet-stream',
'content-length' => 379,
':X-Upload-Content-Length' => '84',
'authorization' => 'Bearer <access token>'
}, 'HTTP::Headers' ),
'_method' => 'POST',
'_uri_canonical' => $VAR1->{'_request'}{'_uri'}
}, 'HTTP::Request' )
}, 'HTTP::Response' );
And then we have a PUT. On the Python side we sent:
(
'PUT',
'https://www.googleapis.com/upload/bigquery/v2/projects/<project
id>/jobs?uploadType=resumable&alt=json&upload_id=AEnB2UpWMRCAOffqyR0d7zvGVtD-KWhrC9jGB-q_igecJgoyz_mIHgEFfs9cYoPxUwUxuflQScMzGxDsKKJ_CJPQq4Os-AkdZA',
{
'Content-Range': 'bytes 0-83/84',
'Content-Length': '84',
'Authorization': u'Bearer <access token>',
'user-agent': u'bq/2.0'
},
<apiclient.http._StreamSlice object at 0x10ce11150>
)
(I have verified that the stream slice object has the same 84 bytes as
Perl.) And here is the Perl failure:
$VAR1 = bless( {
'_protocol' => 'HTTP/1.1',
'_content' => '{
"error": {
"errors": [
{
"domain": "global",
"reason": "badRequest",
"message": "Invalid Upload Request"
}
],
"code": 400,
"message": "Invalid Upload Request"
}
}
',
'_rc' => '400',
'_headers' => bless( {
'connection' => 'close',
'client-response-num' => 1,
'date' => 'Tue, 06 Aug 2013 20:46:07 GMT',
'client-ssl-cert-issuer' => '/C=US/O=Google Inc/CN=Google Internet
Authority',
'client-ssl-cipher' => 'RC4-SHA',
'client-peer' => '<some IP address>:443',
'content-length' => '193',
'client-date' => 'Tue, 06 Aug 2013 20:46:07 GMT',
'content-type' => 'application/json',
'client-ssl-cert-subject' => '/C=US/ST=California/L=Mountain
View/O=Google Inc/CN=*.googleapis.com',
'server' => 'HTTP Upload Server Built on Jul 24 2013 17:20:01
(1374711601)',
'client-ssl-socket-class' => 'IO::Socket::SSL'
}, 'HTTP::Headers' ),
'_msg' => 'Bad Request',
'_request' => bless( {
'_content' => '{"demo_string":"foo", "demo_integer":"2"}
{"demo_string":"bar", "demo_integer":"3"}
',
'_uri' => bless( do{\(my $o =
'https://www.googleapis.com/upload/bigquery/v2/projects/<project
id>/jobs?uploadType=resumable&upload_id=AEnB2Ur0mdwmZpMot6ftkgj1IkqK0f7oPbZrXWQekUDHK_E2o2HKznJO6DK2xPYCB-nhUGrMrEJJ7z1Tz9Crnka9e5EYGP1lWQ')},
'URI::https' ),
'_headers' => bless( {
'user-agent' => 'libwww-perl/6.05',
':Content-Length' => '84',
':Content-Range' => '0-83/84',
'content-length' => 84,
'authorization' => 'Bearer <access token>'
}, 'HTTP::Headers' ),
'_method' => 'PUT',
'_uri_canonical' => $VAR1->{'_request'}{'_uri'}
}, 'HTTP::Request' )
}, 'HTTP::Response' );
What should I try changing on the Perl side to make BigQuery respond to me
like it does bq.

No comments:

Post a Comment