5,035 questions
0
votes
0
answers
25
views
Why do I get List index out of range error when writing a sharepoint list to azure delta lake using pyspark on Azure Databricks?
Writing a SharePoint list to delta file format and I get this error- list index out of range. I have included all the required columns to be fetched from sharepoint and check the datatype when writing ...
0
votes
0
answers
29
views
How to run MLLib random forest in Databricks on time series data?
I try to forecast time series data, namely the "outgoing_quantity". My data consists of two "ID" columns ("gtin" and "location_id"). For each combination of ...
0
votes
0
answers
85
views
Unable to creat credentials in Azure Databricks even after give blob data contributor role to my access connector
I am getting error
Missing validation token for service principal. Please provide a valid ARM-scoped Entra ID token in the 'X-Databricks-Azure-SP-Management-Token' request header and retry. For ...
Advice
0
votes
0
replies
33
views
Data Pipeline for Bringing Data from Oracle Fusion to Azure Databricks
I am trying to bring Oracle Fusion (SCM, HCM, Finance) Data and push to ADLS Gen2. Databricks used for Data Transformation and Powerbi used for Reports Visualization.
I have 3 Option.
Option 1 :
...
0
votes
0
answers
62
views
Azure Databricks - use of parameters in SQL DDL in a Notebook
I am trying to set up a notebook to configure the basics of a catalog and its schemas, permissions etc for our developers can be "set loose"...
I have a parameter set as a widget: v_catalog ...
Best practices
0
votes
0
replies
37
views
Oracle Fusion to Azure
May Someone help me with Pros and Cons for each approach given below.
I am trying to bring Oracle Fusion SCM and HCM data to Azure using Databricks. I am unaware which option is cost effective, need ...
1
vote
1
answer
66
views
Spark JDBC reading wrong character encoding from PostgreSQL with server_encoding = SQL_ASCII
I'm reading data from a PostgreSQL 8.4 database into PySpark using the JDBC connector.
The database's server_encoding is SQL_ASCII.
When I query the table directly in pgAdmin, names like SÉRGIO or ...
0
votes
2
answers
88
views
How to write parquet file to Databricks Volume?
I'd like to export data from tables within my Databricks Unity Catalog. I'd like to transform each of the tables to a single parquet file which I can download. I thought I just write a table to a ...
0
votes
1
answer
66
views
Union Two Datasets Causes Records to Unexpectedly Filter
NOTE: I am running this query on Azure Databricks in a serverless Notebook.
I have two tables with identical schema: foo and bar. They have the same number of columns, with the same names, in the same ...
0
votes
0
answers
62
views
WorkspaceClient in Databricks SDK for Python to Connect to Foreign Workspace
Running the following code in a serverless notebook on Azure Databricks:
WORKSPACES = [
{'workspace': 'dev', 'url': 'https://adb-123.x.azuredatabricks.net/'},
{'workspace': 'test', 'url': '...
1
vote
1
answer
88
views
Apache Spark TransformWithState operator not working as expected
Hi I'm trying to implement a stateprocessor for my custom logic., ideally we are streaming and I want the custom logic of calculating packet loss from a previous row.
i implemented the stateprocessor ...
0
votes
1
answer
139
views
Databricks Community Edition: spark.conf.get('spark.sql.adaptiveExecution.enabled') not available on serverless compute
I’m using Databricks Community Edition (Free tier) with Spark 4.0.0. I noticed that the UI no longer allows creating a standard cluster — only the serverless compute option is available.
I tried the ...
0
votes
2
answers
101
views
SQLSTATE: 42K0I error when connecting Databricks to ADLS
could someone help me with connecting Databricks with ADLS? I have tried connecting with (as suppose) all the ways I could - SAS Token, Service Principle or even one more way I don't really remember ...
0
votes
0
answers
47
views
Spark: VSAM File read issue with special character
We have a scenario to read a VSAM file directly along with a copy book to understand the column lengths, we were using COBRIX library as part of spark read.
However, we could the same is not properly ...
0
votes
1
answer
59
views
Unable to authorize databricks from Azure Devops Stage
I'm trying to build a devops pipeline that allows to upload a python wheel file to my databricks volume.
However I keep getting the error : "Error: Authorization failed. Your token may be expired ...
0
votes
0
answers
68
views
How to reference entries in the secret scope when creating a unity catalog connection?
I am creating a unity catalog connection to an oracle database using terraform for my databricks unity catalog hosted on azure.
I try to create the connection, the creation works fine, but once i try ...
0
votes
0
answers
81
views
Error when reading .csv.gz files in databricks
I have a small files with.csv.gz compressed format in gcs bucket and have mounted it and created external volumes on top of it in databricks(unity catalog enabled). So when I try to read a file with ...
0
votes
0
answers
28
views
STREAMING_CONNECT_SERIALIZATION_ERROR - When sending Events from Databricks Notebook to Azure EventHub
Trying to send streaming data to Azure EventHub using Databricks Notebook. When notebook running, getting following error message.
[STREAMING_CONNECT_SERIALIZATION_ERROR] Cannot serialize the function ...
0
votes
0
answers
43
views
Databricks Cleanroom: Does Invited Collaborator have capability to adjust Timeout Error Timer?
I am running tests on a notebook from within Databricks' "Cleanroom" environment (therefore it is run on serverless compute managed by Databricks).
Im running into the following timeout ...
0
votes
1
answer
68
views
manage Z-Order with Predictive Optimization in databricks
I want to understand how to manage Z-Order in Databricks when using Predictive Optimization (PO). According to the documentation:
"OPTIMIZE does not run ZORDER when executed with predictive ...
1
vote
0
answers
55
views
Databricks - LOCATION_OVERLAP Error with AutoLoader pipeline ingesting from external location
I am trying to use pipelines in Databricks to ingest data from an external location to the datalake using AutoLoader, and I am facing this issue. I have noticed other posts with similar errors, but in ...
0
votes
0
answers
95
views
Unable to Connect to Databrick Lakebase with PostGreSQL: Error: An existing connection was forcibly closed by the remote host
I have just activated the new Databricks Lakebase. However, when I attempt to connect to the Lakebase with my application I get the error:
Unable to read data from the transport connection: An ...
0
votes
0
answers
67
views
Determine PK Columns on 150 tables in Databricks
I am using Python / PySpark to find the PK Columns in tables on Databricks delta tables.
There some tables that have 5 million rows and there are some tables with 10 columns forming part of PK.
I ...
0
votes
0
answers
74
views
Access task level parameters of databricks job along with parameters passed by airflow job
I have a airflow DAG which calls databricks job that has a task level parameters defined as job_run_id (job.run_id) and has a type as python_script. When I try to access it using sys.argv and ...
0
votes
0
answers
69
views
Unable to Create an Azure SAS Token to Be Used with Databricks to Connect to Azure ADLS Gen 2
I am trying to establish a connection to our Azure Data Lake Gen2 using a SAS Token.
I have created the following SAS token
spark.conf.set("fs.azure.account.auth.type.adlsprexxxxx.dfs.core....
2
votes
0
answers
200
views
How to Define Reusable Job Cluster Configurations in Databricks Asset Bundles Across Separate YAML Files?
Problem
I’m trying to create reusable job cluster configurations in Databricks Asset Bundles (DAB) that can be referenced across multiple jobs defined in separate YAML files. I want to avoid ...
0
votes
1
answer
114
views
How to set proxy to create WorkspaceClient in Databricks using Java SDK
I am working on Azure Databricks Test Automation using Java. There are a number of Jobs and pipelines that are created in Azure Databricks to process data. I want to create WorkspaceClient for them ...
2
votes
2
answers
177
views
Regex Expression to avoid space and other character
I am working with a Transformation logic in Databricks. Basically there is field called rip_fw which has values like "LC.JO.P051S1-1250" , "LF.030707 23:54-496" like this , as per ...
0
votes
1
answer
109
views
How to dynamically generate SQL to Update/Insert a table in Azure Databricks Notebook
Its a sort of CDC ( Change Data Capture ) scenario in which I am trying to compare new data (in tblNewData) with old data (in tblOldData), and logging the changes into a log table (tblExpectedDataLog) ...
1
vote
1
answer
138
views
How to replace existing data in a particular sheet of an existing excel file using pyspark dataframe?
I am using Azure Databricks and Azure Data Storage Explorer for my operations. I have an excel file of under 30 MB containing multiple sheets. I want to replace the data in one sheet every month when ...
0
votes
0
answers
59
views
Set spark config from init script instead of UI
I need to set spark config from init script instead of UI or shared notebook.
This because I have multiple clusters for multiple envs, and the config is the same, so I want the clusters to get spark ...
0
votes
0
answers
80
views
How to handle Databricks Jobs & Pipelines failure and revert some changes?
I'm running a Databricks job that consists of four notebooks. In Notebook-2, I'm creating an intermediate table in DBFS. The requirement is that if this job fails at any stage after the table creation,...
0
votes
0
answers
52
views
Delta live tables are producing different results
I am trying to perform aggregation on top a table. I applied same aggregation in dlt pipeline and pyspark query. But results are different.
My pyspark query looks like below: -
agg_df = filter_df....
0
votes
1
answer
53
views
Global Temp view shows up empty when passed from Pyspark to Scala notebook in Databricks
I have a Azure Databricks workflow which runs a Pyspark notebook which in turns calls a Scala notebook(legacy) for a list of tables. In Pyspark notebook, I save a Dataframe to a GlobalTempView and ...
0
votes
1
answer
114
views
Orchestration control in Data Pipeline in Azure
I am designing the Data Pipeline which consumes data from Salesforce using bulk API endpoint (pull mechanism).
The data comes and lands in an ADLS Gen2 Bronze Layer.
Next transformation job will start ...
0
votes
0
answers
74
views
SparkRuntimeException Error while displaying Spark DataFrame
When i display my spark df, i run into a SparkRuntimeException. I am unsure of what it means or what I need to fix.
file_path = "/Volumes/filepath/file.xlsx"
df = spark.read \
.format(&...
0
votes
1
answer
434
views
Error Accessing Volume Data in Databricks Trial: "Maximum Number of Retries Exceeded"
I’m currently learning Databricks using a trial account. I created a volume and successfully loaded data into it. However, when trying to access the file using Spark, I encountered the following error:...
0
votes
1
answer
43
views
Unable to run neo4j create constraint cypher query from Databricks using pyspark connector
I am using databricks notebook and neo4j spark connector to run cypher query to create constraints. While executing its given an error.. I tried multiple ways to change the databricks runtime version ...
0
votes
0
answers
54
views
BigQuery connection fails in Azure Databricks when in Delta Live Table (DLT)
The same BigQuery connection in Azure Databricks works correctly in a notebook and causes problems in a Delta Live Table (DLT) pipeline.
In a notebook this runs correctly on serverless and standard ...
3
votes
2
answers
2k
views
Public DBFS root is disabled. Access is denied on path in Databricks community version
I am trying to get familiar with Databricks community edition. I successfully uploaded a table using upload data feature. Now when I try to use the function .show(), it gave me error.
The picture is ...
3
votes
1
answer
322
views
on behalf of token token provided by databricks apps is not recoginizing the scopes added
I have just started exploring databricks apps for building a Flask/Dash based data app. To start with I am playing with a hello world template to understand the on behalf of user authentication. I am ...
0
votes
1
answer
133
views
Migrate Hive table to Unity Catalog with dynamic folder paths
I'm migrating tables from hive_metastore to Unity Catalog on Databricks.
We have a process that runs twice a day, writing data into folders using the following format:
<YYYYMMDDHHMM>_<...
0
votes
2
answers
198
views
Assign groups to databricks workspace - REST API
I'm having trouble assigning account-level groups to my Databricks workspace. I've authenticated at the account level to retrieve all created groups, applied transformations to filter only the ...
0
votes
1
answer
239
views
Databricks Unity Catalog Error: [UNSUPPORTED_DATA_TYPE_FOR_DATASOURCE]
I’m migrating objects from hive_metastore to Unity Catalog on Databricks.
Some of my legacy tables are stored textfile.
When I try to sync these tables into the Unity Catalog, I get the following ...
1
vote
1
answer
123
views
Copying Missing store_id Records from Previous Month Data with Specific Grouping Logic in Databricks
I'm working in a Databricks notebook using PySpark to process monthly sales data. I have one Apache Spark DataFrames: it is for the current_month_data and previous_month_data in visit_date and ...
1
vote
1
answer
118
views
Pyspark read data to dataframe as decimal
Hello I am trying to read dataverse data to databricks. Generally getting the Data over the API works fine.
But converting the data into a pyspark dataframe throws errors if the dataverse data ...
0
votes
1
answer
194
views
Does Table ACL & Row and Column Level Security With Unity Catalog only apply when accessing tables in Databricks Unity Calalog
I will be implementing Table ACL & Row and Column Level Security With Unity Catalog.
While it is possible to achieve Row and Column Level Security With Unity Catalog will the Row and Column Level ...
1
vote
1
answer
2k
views
how to get databricks job id at the run time
I am trying to get the job id and run id of a databricks job dynamically and keep it on in the table with below code
run_id = self.spark.conf.get("spark.databricks.job.runId", "...
0
votes
1
answer
121
views
Spark convert row to multiple rows
Spark convert half of row data into next row
I have a csv file where each line has even number of words separated by comma. I want to read the csv file and put half data from each row to next row ...
0
votes
2
answers
76
views
How to Convert grouped names into distinct person entries with country preserved
I'm working with a PySpark Data Frame that looks something like this:
+--------------+-----------+
|customer_names|country |
+--------------+-----------+
|jan,marek |Poland |
|anna,kasia ...