1

Just wanted to know do we have more pipeline I/O and runtime parameters available with new version (3.X) of python. If I am correct then currently apache beam provide only File-based IOs: textio, avroio, tfrecordio when using python. But with Java we have more options available like File-based IOs, BigQueryIO, BigtableIO, PubSubIO and SpannerIO.

In my requirement I want to use BigQueryIO in GCP dataflow pipeline using python 3.X, But currently it is not available. Does anyone have some update on ETA when will it be available by apache beam.

2 Answers 2

3

The BigTable Connector for Python 3 is under development for some time now. Currently, there is no ETA but you can follow the relevant Pull-Request from the official Apache Beam repository for further updates.

Sign up to request clarification or add additional context in comments.

1 Comment

There was a typo mistake in my question, I just corrected it. In my requirement I am looking for BigQueryIO connector for python 3.
0

BigQueryIO has been available for quite some time in the Apache Beam Python SDK.

There is also a Pub/Sub IO available as well as BigTable (write). SpannerIO is being worked on as we speak.

This page has more detail https://beam.apache.org/documentation/io/built-in/

UPDATE:

In line with OP giving more details, it turns out that indeed using value providers in the BigQuery query string was not supported.

This has been remedied in the following PR: https://github.com/apache/beam/pull/11040 and will most likely be part of the 2.21.0 release.

UPDATE 2: This new feature has been added in the 2.20.0 release of Apache Beam https://beam.apache.org/blog/2020/04/15/beam-2.20.0.html

Hope it solves your problem!

11 Comments

Just as an FYI, I've opened issues.apache.org/jira/browse/BEAM-9305.
Unfinished PR is at: github.com/apache/beam/pull/11040 Expecting the feature to most likely be part of the 2.21.0 release.
@KaustubhGhole the PR has been merged and will most likely be part of the 2.21.0 release. In the meantime, you can add the changes in the PR to your local installation of Beam and run Dataflow with the custom SDK flag using this modified package, if you want to.
2.21.0 should still be at least 2 months away probably. It is possible they will add this fix to 2.20.0 though which should be out very soon.
The fix was added to 2.20.0 which released on Friday last week: beam.apache.org/blog/2020/04/15/beam-2.20.0.html Hope it solves your problem!
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.