1

We are using google-php-client-api in order to stream web sites page views logs into a table with 9 columns. (formed of basic data types as

  • cookieid(string),
  • domain(string),
  • site_category(string),
  • site_subcategory(string),
  • querystring(string),
  • connectiontime(timestamp),
  • flag(boolean),
  • duration(integer),
  • remoteip(string))

After 10 hours or running the scripts, we observed that bigquery api usage (for insertAll methods) became 300K but during that time 35K rows were only recorded to the table...

When we looked to the google cloud console, approximately 299K of this 300K api usage returned "success codes"; what i mean the streaming seemed to work well.

What we didn't understand, after 299K successful requests, how only 35K rows should be inserted to the table?

Is this a problem caused because of the google-php-client-api or bigquery didn't save the sent data to the table yet?

If the second is true, how much time do we need to see the actual (all of the) rows sent to bigquery?

Code used for streaming data:

    $rows = array();
    $data = json_decode($rawjson);
    $row = new Google_Service_Bigquery_TableDataInsertAllRequestRows();
    $row->setJson($data);
    $row->setInsertId(strtotime('now'));
    $rows[0] = $row;

    $req = new Google_Service_Bigquery_TableDataInsertAllRequest();
    $req->setKind('bigquery#tableDataInsertAllRequest');
    $req->setRows($rows);

    $this->service->tabledata->insertAll($projectid, $datasetid, $tableid, $req);

Thank you in advance,

Cihan

3
  • Yes, but i see that there is 1 day limitation. What i mean, according to SO rules, i see that at the moment, i should wait 1 more hour before the SO system allows me to click the tick. Commented Jul 28, 2015 at 12:25
  • You wait one more day. The idea is to learn the process. Thanks. Commented Jul 28, 2015 at 12:37
  • I wrote "1 more hour" because when i was writing the above comment, 23 hours of "1 day limitation" already passed. Thank you, Best. Commented Jul 29, 2015 at 13:08

1 Answer 1

4

We resolved this issue. We saw that it was caused because of this code line:

$row->setInsertId(strtotime('now'));

As we have at least 10-20 requests per second; because of this "insertID", sent to BigQuery, which is depending on the current timestamp; BigQuery was saving only 1 request per second and was rejecting all of other requests without saving them to the table.

We removed this line, now the numbers are coherents.

Sign up to request clarification or add additional context in comments.

1 Comment

Exactly - insertId allows to work with "at least once" delivery systems. You can use the 'id' to make sure no duplicated data is inserted.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.