Skip to main content
added 97 characters in body
Source Link
Nexaspx
  • 148
  • 4

Sounds to me like you're describing an ETL. You load some data, do some intermediate computations and go about storing it. Indeed this gets out of hand when the workflow gets complicated or the data too big.

It really depends on your systems needs - how critical it is to your organization, how stable do you want it to be and how important is visibility into that workflow is (its almost always a big plus).

Since you're familiar with python, you could look up Apache Airflow which is an orchastration platform in which you define the steps of your workflow (called operators by Airflow) and the relationships between them. It's quite verbose and might be an overkill but yet it's open source.

If this is something that's expected to stay small, you could wrap those main functionalities into methods just for readability- then you could push those conditions further down the stack, decouple your code and have it more readable:

def extract(from='db'):
    if from == 'db':
        yield ...
    else:
        yield ...

def transform():
    compute compute compute ...

def load(view=True, make=True):
    store()
    if view:
        view()
        if make():
            make()


while batch in extract():
    result = transform(batch)
    load(batchresult)

Sounds to me like you're describing an ETL. You load some data, do some intermediate computations and go about storing it. Indeed this gets out of hand when the workflow gets complicated or the data too big.

It really depends on your systems needs - how critical it is to your organization, how stable do you want it to be and how important is visibility into that workflow is (its almost always a big plus).

Since you're familiar with python, you could look up Apache Airflow which is an orchastration platform in which you define the steps of your workflow (called operators by Airflow) and the relationships between them. It's quite verbose and might be an overkill but yet it's open source.

If this is something that's expected to stay small, you could wrap those main functionalities into methods just for readability:

def extract(from='db'):
    if from == 'db':
        yield ...
    else:
        yield ...

def transform():
    compute compute compute ...

def load(view=True):
    store()
    if view:
        view()

while batch in extract():
    transform(batch)
    load(batch)

Sounds to me like you're describing an ETL. You load some data, do some intermediate computations and go about storing it. Indeed this gets out of hand when the workflow gets complicated or the data too big.

It really depends on your systems needs - how critical it is to your organization, how stable do you want it to be and how important is visibility into that workflow is (its almost always a big plus).

Since you're familiar with python, you could look up Apache Airflow which is an orchastration platform in which you define the steps of your workflow (called operators by Airflow) and the relationships between them. It's quite verbose and might be an overkill but yet it's open source.

If this is something that's expected to stay small, you could wrap those main functionalities into methods - then you could push those conditions further down the stack, decouple your code and have it more readable:

def extract(from='db'):
    if from == 'db':
        yield ...
    else:
        yield ...

def transform():
    compute compute compute ...

def load(view=True, make=True):
    store()
    if view:
        view()
        if make():
            make()


while batch in extract():
    result = transform(batch)
    load(result)
added 357 characters in body
Source Link
Nexaspx
  • 148
  • 4

Sounds to me like you're describing an ETL. You load some data, do some intermediate computations and go about storing it. Indeed this gets out of hand when the workflow gets complicated or the data too big.

It really depends on your systems needs - how critical it is to your organization, how stable do you want it to be and how important is visibility into that workflow is (its almost always a big plus).

Since you're familiar with python, you could look up Apache Airflow which is an orchastration platform in which you define the steps of your workflow (called operators by Airflow) and the relationships between them. It's quite verbose and might be an overkill but check it and its' alternatives to see what's the right fityet it's open source.

If this is something that's expected to stay small, you could wrap those main functionalities into methods just for readability:

def extract(from='db'):
    if from == 'db':
        yield ...
    else:
        yield ...

def transform():
    compute compute compute ...

def load(view=True):
    store()
    if view:
        view()

while batch in extract():
    transform(batch)
    load(batch)

Sounds to me like you're describing an ETL. You load some data, do some intermediate computations and go about storing it. Indeed this gets out of hand when the workflow gets complicated or the data too big.

It really depends on your systems needs - how critical it is to your organization, how stable do you want it to be and how important is visibility into that workflow is (its almost always a big plus).

Since you're familiar with python, you could look up Apache Airflow which is an orchastration platform in which you define the steps of your workflow (called operators by Airflow) and the relationships between them. It's quite verbose and might be an overkill but check it and its' alternatives to see what's the right fit.

Sounds to me like you're describing an ETL. You load some data, do some intermediate computations and go about storing it. Indeed this gets out of hand when the workflow gets complicated or the data too big.

It really depends on your systems needs - how critical it is to your organization, how stable do you want it to be and how important is visibility into that workflow is (its almost always a big plus).

Since you're familiar with python, you could look up Apache Airflow which is an orchastration platform in which you define the steps of your workflow (called operators by Airflow) and the relationships between them. It's quite verbose and might be an overkill but yet it's open source.

If this is something that's expected to stay small, you could wrap those main functionalities into methods just for readability:

def extract(from='db'):
    if from == 'db':
        yield ...
    else:
        yield ...

def transform():
    compute compute compute ...

def load(view=True):
    store()
    if view:
        view()

while batch in extract():
    transform(batch)
    load(batch)
Source Link
Nexaspx
  • 148
  • 4

Sounds to me like you're describing an ETL. You load some data, do some intermediate computations and go about storing it. Indeed this gets out of hand when the workflow gets complicated or the data too big.

It really depends on your systems needs - how critical it is to your organization, how stable do you want it to be and how important is visibility into that workflow is (its almost always a big plus).

Since you're familiar with python, you could look up Apache Airflow which is an orchastration platform in which you define the steps of your workflow (called operators by Airflow) and the relationships between them. It's quite verbose and might be an overkill but check it and its' alternatives to see what's the right fit.