Newest 'azure-databricks' Questions

0 votes

0 answers

25 views

Why do I get List index out of range error when writing a sharepoint list to azure delta lake using pyspark on Azure Databricks?

Writing a SharePoint list to delta file format and I get this error- list index out of range. I have included all the required columns to be fetched from sharepoint and check the datatype when writing ...

Sruthi Gopalakrishnan

99

asked Nov 25 at 9:38

0 votes

0 answers

29 views

How to run MLLib random forest in Databricks on time series data?

I try to forecast time series data, namely the "outgoing_quantity". My data consists of two "ID" columns ("gtin" and "location_id"). For each combination of ...

the_economist

579

asked Nov 20 at 11:24

0 votes

0 answers

85 views

Unable to creat credentials in Azure Databricks even after give blob data contributor role to my access connector

I am getting error Missing validation token for service principal. Please provide a valid ARM-scoped Entra ID token in the 'X-Databricks-Azure-SP-Management-Token' request header and retry. For ...

kanishk sharma

3

asked Nov 20 at 8:38

Advice

0 votes

0 replies

33 views

Data Pipeline for Bringing Data from Oracle Fusion to Azure Databricks

I am trying to bring Oracle Fusion (SCM, HCM, Finance) Data and push to ADLS Gen2. Databricks used for Data Transformation and Powerbi used for Reports Visualization. I have 3 Option. Option 1 : ...

Binod Kumar

3

asked Nov 16 at 16:11

0 votes

0 answers

62 views

Azure Databricks - use of parameters in SQL DDL in a Notebook

I am trying to set up a notebook to configure the basics of a catalog and its schemas, permissions etc for our developers can be "set loose"... I have a parameter set as a widget: v_catalog ...

le Pusscat

59

asked Nov 15 at 15:07

Best practices

0 votes

0 replies

37 views

Oracle Fusion to Azure

May Someone help me with Pros and Cons for each approach given below. I am trying to bring Oracle Fusion SCM and HCM data to Azure using Databricks. I am unaware which option is cost effective, need ...

Binod Kumar

3

asked Nov 14 at 8:46

1 vote

1 answer

66 views

Spark JDBC reading wrong character encoding from PostgreSQL with server_encoding = SQL_ASCII

I'm reading data from a PostgreSQL 8.4 database into PySpark using the JDBC connector. The database's server_encoding is SQL_ASCII. When I query the table directly in pgAdmin, names like SÉRGIO or ...

Thiago Luan

51

asked Nov 12 at 13:15

0 votes

2 answers

88 views

How to write parquet file to Databricks Volume?

I'd like to export data from tables within my Databricks Unity Catalog. I'd like to transform each of the tables to a single parquet file which I can download. I thought I just write a table to a ...

the_economist

579

asked Nov 10 at 14:55

0 votes

1 answer

66 views

Union Two Datasets Causes Records to Unexpectedly Filter

NOTE: I am running this query on Azure Databricks in a serverless Notebook. I have two tables with identical schema: foo and bar. They have the same number of columns, with the same names, in the same ...

Adam

4,236

asked Oct 28 at 11:28

0 votes

0 answers

62 views

WorkspaceClient in Databricks SDK for Python to Connect to Foreign Workspace

Running the following code in a serverless notebook on Azure Databricks: WORKSPACES = [ {'workspace': 'dev', 'url': 'https://adb-123.x.azuredatabricks.net/'}, {'workspace': 'test', 'url': '...

Adam

4,236

asked Oct 27 at 14:18

1 vote

1 answer

88 views

Apache Spark TransformWithState operator not working as expected

Hi I'm trying to implement a stateprocessor for my custom logic., ideally we are streaming and I want the custom logic of calculating packet loss from a previous row. i implemented the stateprocessor ...

Pranav ramachandran

56

asked Oct 18 at 17:24

0 votes

1 answer

139 views

Databricks Community Edition: spark.conf.get('spark.sql.adaptiveExecution.enabled') not available on serverless compute

I’m using Databricks Community Edition (Free tier) with Spark 4.0.0. I noticed that the UI no longer allows creating a standard cluster — only the serverless compute option is available. I tried the ...

Aravindh_P

43

asked Oct 17 at 12:54

0 votes

2 answers

101 views

SQLSTATE: 42K0I error when connecting Databricks to ADLS

could someone help me with connecting Databricks with ADLS? I have tried connecting with (as suppose) all the ways I could - SAS Token, Service Principle or even one more way I don't really remember ...

user31706609

11

asked Oct 17 at 12:21

0 votes

0 answers

47 views

Spark: VSAM File read issue with special character

We have a scenario to read a VSAM file directly along with a copy book to understand the column lengths, we were using COBRIX library as part of spark read. However, we could the same is not properly ...

Rocky1989

409

asked Oct 15 at 7:06

0 votes

1 answer

59 views

Unable to authorize databricks from Azure Devops Stage

I'm trying to build a devops pipeline that allows to upload a python wheel file to my databricks volume. However I keep getting the error : "Error: Authorization failed. Your token may be expired ...

Pranav ramachandran

56

asked Oct 9 at 10:09

0 votes

0 answers

68 views

How to reference entries in the secret scope when creating a unity catalog connection?

I am creating a unity catalog connection to an oracle database using terraform for my databricks unity catalog hosted on azure. I try to create the connection, the creation works fine, but once i try ...

gorillanerve

1

asked Oct 8 at 11:03

0 votes

0 answers

81 views

Error when reading .csv.gz files in databricks

I have a small files with.csv.gz compressed format in gcs bucket and have mounted it and created external volumes on top of it in databricks(unity catalog enabled). So when I try to read a file with ...

Tony

311

asked Oct 2 at 15:33

0 votes

0 answers

28 views

STREAMING_CONNECT_SERIALIZATION_ERROR - When sending Events from Databricks Notebook to Azure EventHub

Trying to send streaming data to Azure EventHub using Databricks Notebook. When notebook running, getting following error message. [STREAMING_CONNECT_SERIALIZATION_ERROR] Cannot serialize the function ...

shameera2008

61

asked Sep 29 at 14:12

0 votes

0 answers

43 views

Databricks Cleanroom: Does Invited Collaborator have capability to adjust Timeout Error Timer?

I am running tests on a notebook from within Databricks' "Cleanroom" environment (therefore it is run on serverless compute managed by Databricks). Im running into the following timeout ...

sam

1

asked Sep 25 at 2:18

0 votes

1 answer

68 views

manage Z-Order with Predictive Optimization in databricks

I want to understand how to manage Z-Order in Databricks when using Predictive Optimization (PO). According to the documentation: "OPTIMIZE does not run ZORDER when executed with predictive ...

Mohamed Mokhtar

55

asked Sep 12 at 18:11

1 vote

0 answers

55 views

Databricks - LOCATION_OVERLAP Error with AutoLoader pipeline ingesting from external location

I am trying to use pipelines in Databricks to ingest data from an external location to the datalake using AutoLoader, and I am facing this issue. I have noticed other posts with similar errors, but in ...

MattSt

1,203

asked Sep 12 at 7:33

0 votes

0 answers

95 views

Unable to Connect to Databrick Lakebase with PostGreSQL: Error: An existing connection was forcibly closed by the remote host

I have just activated the new Databricks Lakebase. However, when I attempt to connect to the Lakebase with my application I get the error: Unable to read data from the transport connection: An ...

Patterson

3,011

asked Aug 30 at 13:43

0 votes

0 answers

67 views

Determine PK Columns on 150 tables in Databricks

I am using Python / PySpark to find the PK Columns in tables on Databricks delta tables. There some tables that have 5 million rows and there are some tables with 10 columns forming part of PK. I ...

learner

1,033

asked Aug 24 at 11:42

0 votes

0 answers

74 views

Access task level parameters of databricks job along with parameters passed by airflow job

I have a airflow DAG which calls databricks job that has a task level parameters defined as job_run_id (job.run_id) and has a type as python_script. When I try to access it using sys.argv and ...

Diksha Bisht

1

asked Aug 23 at 12:42

0 votes

0 answers

69 views

Unable to Create an Azure SAS Token to Be Used with Databricks to Connect to Azure ADLS Gen 2

I am trying to establish a connection to our Azure Data Lake Gen2 using a SAS Token. I have created the following SAS token spark.conf.set("fs.azure.account.auth.type.adlsprexxxxx.dfs.core....

Patterson

3,011

asked Aug 16 at 10:41

2 votes

0 answers

200 views

How to Define Reusable Job Cluster Configurations in Databricks Asset Bundles Across Separate YAML Files?

Problem I’m trying to create reusable job cluster configurations in Databricks Asset Bundles (DAB) that can be referenced across multiple jobs defined in separate YAML files. I want to avoid ...

Harshit Gupta

21

asked Aug 13 at 5:34

0 votes

1 answer

114 views

How to set proxy to create WorkspaceClient in Databricks using Java SDK

I am working on Azure Databricks Test Automation using Java. There are a number of Jobs and pipelines that are created in Azure Databricks to process data. I want to create WorkspaceClient for them ...

ashish chauhan

385

asked Aug 12 at 15:20

2 votes

2 answers

177 views

Regex Expression to avoid space and other character

I am working with a Transformation logic in Databricks. Basically there is field called rip_fw which has values like "LC.JO.P051S1-1250" , "LF.030707 23:54-496" like this , as per ...

sayan nandi

119

asked Aug 1 at 5:38

0 votes

1 answer

109 views

How to dynamically generate SQL to Update/Insert a table in Azure Databricks Notebook

Its a sort of CDC ( Change Data Capture ) scenario in which I am trying to compare new data (in tblNewData) with old data (in tblOldData), and logging the changes into a log table (tblExpectedDataLog) ...

Aza

27

asked Aug 1 at 4:39

1 vote

1 answer

138 views

How to replace existing data in a particular sheet of an existing excel file using pyspark dataframe?

I am using Azure Databricks and Azure Data Storage Explorer for my operations. I have an excel file of under 30 MB containing multiple sheets. I want to replace the data in one sheet every month when ...

spacestar

21

asked Jul 31 at 8:16

0 votes

0 answers

59 views

Set spark config from init script instead of UI

I need to set spark config from init script instead of UI or shared notebook. This because I have multiple clusters for multiple envs, and the config is the same, so I want the clusters to get spark ...

ggirodda

790

asked Jul 25 at 12:36

0 votes

0 answers

80 views

How to handle Databricks Jobs & Pipelines failure and revert some changes?

I'm running a Databricks job that consists of four notebooks. In Notebook-2, I'm creating an intermediate table in DBFS. The requirement is that if this job fails at any stage after the table creation,...

user22

153

asked Jul 23 at 20:28

0 votes

0 answers

52 views

Delta live tables are producing different results

I am trying to perform aggregation on top a table. I applied same aggregation in dlt pipeline and pyspark query. But results are different. My pyspark query looks like below: - agg_df = filter_df....

awesome_sangram

89

asked Jul 20 at 12:58

0 votes

1 answer

53 views

Global Temp view shows up empty when passed from Pyspark to Scala notebook in Databricks

I have a Azure Databricks workflow which runs a Pyspark notebook which in turns calls a Scala notebook(legacy) for a list of tables. In Pyspark notebook, I save a Dataframe to a GlobalTempView and ...

AGK

3

asked Jul 17 at 18:10

0 votes

1 answer

114 views

Orchestration control in Data Pipeline in Azure

I am designing the Data Pipeline which consumes data from Salesforce using bulk API endpoint (pull mechanism). The data comes and lands in an ADLS Gen2 Bronze Layer. Next transformation job will start ...

Binod Kumar

3

asked Jul 15 at 14:25

0 votes

0 answers

74 views

SparkRuntimeException Error while displaying Spark DataFrame

When i display my spark df, i run into a SparkRuntimeException. I am unsure of what it means or what I need to fix. file_path = "/Volumes/filepath/file.xlsx" df = spark.read \ .format(&...

Evan

13

asked Jul 10 at 17:28

0 votes

1 answer

434 views

Error Accessing Volume Data in Databricks Trial: "Maximum Number of Retries Exceeded"

I’m currently learning Databricks using a trial account. I created a volume and successfully loaded data into it. However, when trying to access the file using Spark, I encountered the following error:...

Learn Hadoop

3,058

asked Jul 5 at 12:12

0 votes

1 answer

43 views

Unable to run neo4j create constraint cypher query from Databricks using pyspark connector

I am using databricks notebook and neo4j spark connector to run cypher query to create constraints. While executing its given an error.. I tried multiple ways to change the databricks runtime version ...

Arbind Chandra

111

asked Jul 3 at 13:00

0 votes

0 answers

54 views

BigQuery connection fails in Azure Databricks when in Delta Live Table (DLT)

The same BigQuery connection in Azure Databricks works correctly in a notebook and causes problems in a Delta Live Table (DLT) pipeline. In a notebook this runs correctly on serverless and standard ...

Piotr K

1,055

asked Jun 27 at 7:17

3 votes

2 answers

2k views

Public DBFS root is disabled. Access is denied on path in Databricks community version

I am trying to get familiar with Databricks community edition. I successfully uploaded a table using upload data feature. Now when I try to use the function .show(), it gave me error. The picture is ...

Reactoo

1,074

asked Jun 21 at 11:05

3 votes

1 answer

322 views

on behalf of token token provided by databricks apps is not recoginizing the scopes added

I have just started exploring databricks apps for building a Flask/Dash based data app. To start with I am playing with a hello world template to understand the on behalf of user authentication. I am ...

akhil pathirippilly

1,102

asked Jun 19 at 15:03

0 votes

1 answer

133 views

Migrate Hive table to Unity Catalog with dynamic folder paths

I'm migrating tables from hive_metastore to Unity Catalog on Databricks. We have a process that runs twice a day, writing data into folders using the following format: <YYYYMMDDHHMM>_<...

Tiago

85

asked Jun 18 at 18:41

0 votes

2 answers

198 views

Assign groups to databricks workspace - REST API

I'm having trouble assigning account-level groups to my Databricks workspace. I've authenticated at the account level to retrieve all created groups, applied transformations to filter only the ...

user25060582

1

asked Jun 17 at 15:45

0 votes

1 answer

239 views

Databricks Unity Catalog Error: [UNSUPPORTED_DATA_TYPE_FOR_DATASOURCE]

I’m migrating objects from hive_metastore to Unity Catalog on Databricks. Some of my legacy tables are stored textfile. When I try to sync these tables into the Unity Catalog, I get the following ...

Tiago

85

asked Jun 16 at 17:55

1 vote

1 answer

123 views

Copying Missing store_id Records from Previous Month Data with Specific Grouping Logic in Databricks

I'm working in a Databricks notebook using PySpark to process monthly sales data. I have one Apache Spark DataFrames: it is for the current_month_data and previous_month_data in visit_date and ...

Chinnu

11

asked Jun 16 at 15:11

1 vote

1 answer

118 views

Pyspark read data to dataframe as decimal

Hello I am trying to read dataverse data to databricks. Generally getting the Data over the API works fine. But converting the data into a pyspark dataframe throws errors if the dataverse data ...

Courier

95

asked Jun 16 at 7:28

0 votes

1 answer

194 views

Does Table ACL & Row and Column Level Security With Unity Catalog only apply when accessing tables in Databricks Unity Calalog

I will be implementing Table ACL & Row and Column Level Security With Unity Catalog. While it is possible to achieve Row and Column Level Security With Unity Catalog will the Row and Column Level ...

Patterson

3,011

asked Jun 9 at 17:08

1 vote

1 answer

2k views

how to get databricks job id at the run time

I am trying to get the job id and run id of a databricks job dynamically and keep it on in the table with below code run_id = self.spark.conf.get("spark.databricks.job.runId", "...

Panda

683

asked Jun 9 at 16:33

0 votes

1 answer

121 views

Spark convert row to multiple rows

Spark convert half of row data into next row I have a csv file where each line has even number of words separated by comma. I want to read the csv file and put half data from each row to next row ...

vinayak_narune

738

asked Jun 9 at 12:28

0 votes

2 answers

76 views

How to Convert grouped names into distinct person entries with country preserved

james milwaukee

13

asked Jun 9 at 10:01

Collectives™ on Stack Overflow