Skip to main content
Filter by
Sorted by
Tagged with
0 votes
0 answers
25 views

Writing a SharePoint list to delta file format and I get this error- list index out of range. I have included all the required columns to be fetched from sharepoint and check the datatype when writing ...
Sruthi Gopalakrishnan's user avatar
0 votes
0 answers
29 views

I try to forecast time series data, namely the "outgoing_quantity". My data consists of two "ID" columns ("gtin" and "location_id"). For each combination of ...
the_economist's user avatar
0 votes
0 answers
85 views

I am getting error Missing validation token for service principal. Please provide a valid ARM-scoped Entra ID token in the 'X-Databricks-Azure-SP-Management-Token' request header and retry. For ...
kanishk sharma's user avatar
Advice
0 votes
0 replies
33 views

I am trying to bring Oracle Fusion (SCM, HCM, Finance) Data and push to ADLS Gen2. Databricks used for Data Transformation and Powerbi used for Reports Visualization. I have 3 Option. Option 1 : ...
Binod Kumar's user avatar
0 votes
0 answers
62 views

I am trying to set up a notebook to configure the basics of a catalog and its schemas, permissions etc for our developers can be "set loose"... I have a parameter set as a widget: v_catalog ...
le Pusscat's user avatar
Best practices
0 votes
0 replies
37 views

May Someone help me with Pros and Cons for each approach given below. I am trying to bring Oracle Fusion SCM and HCM data to Azure using Databricks. I am unaware which option is cost effective, need ...
Binod Kumar's user avatar
1 vote
1 answer
66 views

I'm reading data from a PostgreSQL 8.4 database into PySpark using the JDBC connector. The database's server_encoding is SQL_ASCII. When I query the table directly in pgAdmin, names like SÉRGIO or ...
Thiago Luan's user avatar
0 votes
2 answers
88 views

I'd like to export data from tables within my Databricks Unity Catalog. I'd like to transform each of the tables to a single parquet file which I can download. I thought I just write a table to a ...
the_economist's user avatar
0 votes
1 answer
66 views

NOTE: I am running this query on Azure Databricks in a serverless Notebook. I have two tables with identical schema: foo and bar. They have the same number of columns, with the same names, in the same ...
Adam's user avatar
  • 4,236
0 votes
0 answers
62 views

Running the following code in a serverless notebook on Azure Databricks: WORKSPACES = [ {'workspace': 'dev', 'url': 'https://adb-123.x.azuredatabricks.net/'}, {'workspace': 'test', 'url': '...
Adam's user avatar
  • 4,236
1 vote
1 answer
88 views

Hi I'm trying to implement a stateprocessor for my custom logic., ideally we are streaming and I want the custom logic of calculating packet loss from a previous row. i implemented the stateprocessor ...
Pranav ramachandran's user avatar
0 votes
1 answer
139 views

I’m using Databricks Community Edition (Free tier) with Spark 4.0.0. I noticed that the UI no longer allows creating a standard cluster — only the serverless compute option is available. I tried the ...
Aravindh_P's user avatar
0 votes
2 answers
101 views

could someone help me with connecting Databricks with ADLS? I have tried connecting with (as suppose) all the ways I could - SAS Token, Service Principle or even one more way I don't really remember ...
user31706609's user avatar
0 votes
0 answers
47 views

We have a scenario to read a VSAM file directly along with a copy book to understand the column lengths, we were using COBRIX library as part of spark read. However, we could the same is not properly ...
Rocky1989's user avatar
  • 409
0 votes
1 answer
59 views

I'm trying to build a devops pipeline that allows to upload a python wheel file to my databricks volume. However I keep getting the error : "Error: Authorization failed. Your token may be expired ...
Pranav ramachandran's user avatar
0 votes
0 answers
68 views

I am creating a unity catalog connection to an oracle database using terraform for my databricks unity catalog hosted on azure. I try to create the connection, the creation works fine, but once i try ...
gorillanerve's user avatar
0 votes
0 answers
81 views

I have a small files with.csv.gz compressed format in gcs bucket and have mounted it and created external volumes on top of it in databricks(unity catalog enabled). So when I try to read a file with ...
Tony's user avatar
  • 311
0 votes
0 answers
28 views

Trying to send streaming data to Azure EventHub using Databricks Notebook. When notebook running, getting following error message. [STREAMING_CONNECT_SERIALIZATION_ERROR] Cannot serialize the function ...
shameera2008's user avatar
0 votes
0 answers
43 views

I am running tests on a notebook from within Databricks' "Cleanroom" environment (therefore it is run on serverless compute managed by Databricks). Im running into the following timeout ...
sam's user avatar
  • 1
0 votes
1 answer
68 views

I want to understand how to manage Z-Order in Databricks when using Predictive Optimization (PO). According to the documentation: "OPTIMIZE does not run ZORDER when executed with predictive ...
Mohamed Mokhtar's user avatar
1 vote
0 answers
55 views

I am trying to use pipelines in Databricks to ingest data from an external location to the datalake using AutoLoader, and I am facing this issue. I have noticed other posts with similar errors, but in ...
MattSt's user avatar
  • 1,203
0 votes
0 answers
95 views

I have just activated the new Databricks Lakebase. However, when I attempt to connect to the Lakebase with my application I get the error: Unable to read data from the transport connection: An ...
Patterson's user avatar
  • 3,011
0 votes
0 answers
67 views

I am using Python / PySpark to find the PK Columns in tables on Databricks delta tables. There some tables that have 5 million rows and there are some tables with 10 columns forming part of PK. I ...
learner's user avatar
  • 1,033
0 votes
0 answers
74 views

I have a airflow DAG which calls databricks job that has a task level parameters defined as job_run_id (job.run_id) and has a type as python_script. When I try to access it using sys.argv and ...
Diksha Bisht's user avatar
0 votes
0 answers
69 views

I am trying to establish a connection to our Azure Data Lake Gen2 using a SAS Token. I have created the following SAS token spark.conf.set("fs.azure.account.auth.type.adlsprexxxxx.dfs.core....
Patterson's user avatar
  • 3,011
2 votes
0 answers
200 views

Problem I’m trying to create reusable job cluster configurations in Databricks Asset Bundles (DAB) that can be referenced across multiple jobs defined in separate YAML files. I want to avoid ...
Harshit Gupta's user avatar
0 votes
1 answer
114 views

I am working on Azure Databricks Test Automation using Java. There are a number of Jobs and pipelines that are created in Azure Databricks to process data. I want to create WorkspaceClient for them ...
ashish chauhan's user avatar
2 votes
2 answers
177 views

I am working with a Transformation logic in Databricks. Basically there is field called rip_fw which has values like "LC.JO.P051S1-1250" , "LF.030707 23:54-496" like this , as per ...
sayan nandi's user avatar
0 votes
1 answer
109 views

Its a sort of CDC ( Change Data Capture ) scenario in which I am trying to compare new data (in tblNewData) with old data (in tblOldData), and logging the changes into a log table (tblExpectedDataLog) ...
Aza's user avatar
  • 27
1 vote
1 answer
138 views

I am using Azure Databricks and Azure Data Storage Explorer for my operations. I have an excel file of under 30 MB containing multiple sheets. I want to replace the data in one sheet every month when ...
spacestar's user avatar
0 votes
0 answers
59 views

I need to set spark config from init script instead of UI or shared notebook. This because I have multiple clusters for multiple envs, and the config is the same, so I want the clusters to get spark ...
ggirodda's user avatar
  • 790
0 votes
0 answers
80 views

I'm running a Databricks job that consists of four notebooks. In Notebook-2, I'm creating an intermediate table in DBFS. The requirement is that if this job fails at any stage after the table creation,...
user22's user avatar
  • 153
0 votes
0 answers
52 views

I am trying to perform aggregation on top a table. I applied same aggregation in dlt pipeline and pyspark query. But results are different. My pyspark query looks like below: - agg_df = filter_df....
awesome_sangram's user avatar
0 votes
1 answer
53 views

I have a Azure Databricks workflow which runs a Pyspark notebook which in turns calls a Scala notebook(legacy) for a list of tables. In Pyspark notebook, I save a Dataframe to a GlobalTempView and ...
AGK's user avatar
  • 3
0 votes
1 answer
114 views

I am designing the Data Pipeline which consumes data from Salesforce using bulk API endpoint (pull mechanism). The data comes and lands in an ADLS Gen2 Bronze Layer. Next transformation job will start ...
Binod Kumar's user avatar
0 votes
0 answers
74 views

When i display my spark df, i run into a SparkRuntimeException. I am unsure of what it means or what I need to fix. file_path = "/Volumes/filepath/file.xlsx" df = spark.read \ .format(&...
Evan's user avatar
  • 13
0 votes
1 answer
434 views

I’m currently learning Databricks using a trial account. I created a volume and successfully loaded data into it. However, when trying to access the file using Spark, I encountered the following error:...
Learn Hadoop's user avatar
  • 3,058
0 votes
1 answer
43 views

I am using databricks notebook and neo4j spark connector to run cypher query to create constraints. While executing its given an error.. I tried multiple ways to change the databricks runtime version ...
Arbind Chandra's user avatar
0 votes
0 answers
54 views

The same BigQuery connection in Azure Databricks works correctly in a notebook and causes problems in a Delta Live Table (DLT) pipeline. In a notebook this runs correctly on serverless and standard ...
Piotr K's user avatar
  • 1,055
3 votes
2 answers
2k views

I am trying to get familiar with Databricks community edition. I successfully uploaded a table using upload data feature. Now when I try to use the function .show(), it gave me error. The picture is ...
Reactoo's user avatar
  • 1,074
3 votes
1 answer
322 views

I have just started exploring databricks apps for building a Flask/Dash based data app. To start with I am playing with a hello world template to understand the on behalf of user authentication. I am ...
akhil pathirippilly's user avatar
0 votes
1 answer
133 views

I'm migrating tables from hive_metastore to Unity Catalog on Databricks. We have a process that runs twice a day, writing data into folders using the following format: <YYYYMMDDHHMM>_<...
Tiago's user avatar
  • 85
0 votes
2 answers
198 views

I'm having trouble assigning account-level groups to my Databricks workspace. I've authenticated at the account level to retrieve all created groups, applied transformations to filter only the ...
user25060582's user avatar
0 votes
1 answer
239 views

I’m migrating objects from hive_metastore to Unity Catalog on Databricks. Some of my legacy tables are stored textfile. When I try to sync these tables into the Unity Catalog, I get the following ...
Tiago's user avatar
  • 85
1 vote
1 answer
123 views

I'm working in a Databricks notebook using PySpark to process monthly sales data. I have one Apache Spark DataFrames: it is for the current_month_data and previous_month_data in visit_date and ...
Chinnu's user avatar
  • 11
1 vote
1 answer
118 views

Hello I am trying to read dataverse data to databricks. Generally getting the Data over the API works fine. But converting the data into a pyspark dataframe throws errors if the dataverse data ...
Courier's user avatar
  • 95
0 votes
1 answer
194 views

I will be implementing Table ACL & Row and Column Level Security With Unity Catalog. While it is possible to achieve Row and Column Level Security With Unity Catalog will the Row and Column Level ...
Patterson's user avatar
  • 3,011
1 vote
1 answer
2k views

I am trying to get the job id and run id of a databricks job dynamically and keep it on in the table with below code run_id = self.spark.conf.get("spark.databricks.job.runId", "...
Panda's user avatar
  • 683
0 votes
1 answer
121 views

Spark convert half of row data into next row I have a csv file where each line has even number of words separated by comma. I want to read the csv file and put half data from each row to next row ...
vinayak_narune's user avatar
0 votes
2 answers
76 views

I'm working with a PySpark Data Frame that looks something like this: +--------------+-----------+ |customer_names|country | +--------------+-----------+ |jan,marek |Poland | |anna,kasia ...
james milwaukee's user avatar

1
2 3 4 5
101