Avoid file sharing lock in MS Access during bulk insert

Question

File sharing lock count exceeded // python import to access table.

If anyone can assist with this you’ll be a life saver. I have a script in python that is attempting to automate a manual process in which a user imports a .txt with 1,500,000 rows into an access table. In writing the python, I’ve landed on using a bulk insert which basically takes a data frame, and then splits it into .csv’s with some row size like 50,000 and then I insert into the access table from the individual .csv’s.

The problem is it’s a company PC and I can’t increase the MaxLocksPerFile default value of 9500. I’m doing 5,000 row .csv files and committing every 10 batches. So that’s inserting 5,000 rows until it hits 50,000 then it’s committing. It does about 350,000 before throwing the ‘File Sharing Lock Count Exceeded’ error.

I’ve tried every combination of batch size and commit interval one can conceive. I’ve tried executemany to execute one sql statement many times, I’ve tried execute to load 1.5M rows and then commit them. Everything has failed.

Has anyone done something like this in Access? Also, before you say to use a more robust DB, I would if I could. My director uses Access still so at this point, I’m stuck with it. I would use Sql server if I could.

Code here: function to import data

def bulk_insert_to_access(file_path, table_name, temp_file_path=None, chunk_size=5000, commit_interval=100): 
    """
    Inserts data from a dataframe to Access using an optimized bulk insert.
    This method works by saving data to a temporary CSV file and then using an SQL SELECT INTO command.
    """
    # Step 1: Read the .txt file and prepare the dataframe
    df = pd.read_csv(file_path, delimiter="\t", encoding="utf-8")

    # Adjust your data as needed (like column cleaning, etc.)
    column_specs = get_column_specs(table_name)
    df_adjusted = adjust_data_to_specs(df, column_specs)

    
    if not temp_file_path:
        temp_file_path = c for os.path.dirname(file_path)

    # Break DF into chunks
    chunks = [df_adjusted.iloc[i:i + chunk_size] for i in range(0, len(df_adjusted), chunk_size)]

    # List to keep track of temporary files for cleanup later
    chunk_file_paths = []

    # Step 2: Perform the bulk insert via SQL SELECT INTO method
    conn = pyodbc.connect(connect_str)
    cursor = conn.cursor()

    try:
        for idx, chunk in enumerate(chunks):
            # Save the chunk to a temporary CSV file
            chunk_file_path = os.path.join(temp_file_path, f"temp_data_chunk_{idx}.csv")
            chunk.to_csv(chunk_file_path, index=False, header=False)  # Save to temporary file
            chunk_file_paths.append(chunk_file_path)  # Track file for later cleanup

            # Perform the bulk insert for each chunk
            sql = f"""
            INSERT INTO [{table_name}] 
            SELECT * FROM [Text;FMT=TabDelimited;HDR=NO;DATABASE={os.path.dirname(chunk_file_path)}].[{os.path.basename(chunk_file_path)}]
            """
            try:
                # Execute SQL statement to insert data from chunked file
                start_time = time.time()
                cursor.execute(sql)
                
                # Commit after every `commit_interval` chunks
                if (idx + 1) % commit_interval == 0:
                    conn.commit()
                    time.sleep(1)  # Add a small delay after commit to release locks
                
                elapsed_time = time.time() - start_time
                print(f"Bulk insert for chunk {idx} completed in {elapsed_time:.2f} seconds")

            except pyodbc.Error as e:
                print(f"Error during bulk insert for chunk {idx}: {e}")
                conn.rollback()  # Rollback only the current chunk

        # Commit after the last chunk if not already committed
        conn.commit()

    except pyodbc.Error as e:
        print(f"Error during the bulk insert process: {e}")
        conn.rollback()

    finally:
        # Cleanup temporary files
        for file_path in chunk_file_paths:
            if os.path.exists(file_path):
                try:
                    os.remove(file_path)
                except OSError as e:
                    print(f"Error deleting temporary file {file_path}: {e}")

        
        cursor.close()
        conn.close()

Why not apply the same strategy but against the original csv without the dataframe and chunking? — JonSG
– JonSG, Commented Dec 17, 2024 at 16:17
Do you need to do this transactionally? If not, then just reset the connection. If you do need transactions, you can emulate them by creating a new database file, then transferring the data between databases once the insert is complete. — Erik A
– Erik A, Commented Dec 17, 2024 at 18:24
Resolved with calling vba in python python: def run_macro(macro_name): try: access_app = win32com.client.Dispatch(‘Access.Application’) access_app.Visible = False access_app.OpenCurrentDatabase(access_db_path) access_app.DoCmd.RunMacro(macro_name) access_app.Quit() run_macro(“Macro”) VBA: Sub ImportTextFileWithSpec() filePath = “file.txt” importSpec = “SpecName” tableName = “Test” DoCmd.TransferText _ TransferType:=acImportDelim, _ SpecificationName:=importSpec, _ TableName:=tableName, _ FileName:=filePath, _ HasFieldNames:=True End Sub — user28814433
– user28814433, Commented Dec 18, 2024 at 4:02
I’ve tried execute to load 1.5M rows and then commit them....what was the error when you attempted a single call without chunks? Are you the sole user of database? Does the .accdb sit on network or local drive? — Parfait
– Parfait, Commented Dec 18, 2024 at 23:37

Parfait · Accepted Answer · 2024-12-19 15:45:44Z

Consider again the Access SQL-CSV insert of the full 1.5 million rows without use of chunks by following below recommendations:

As best practice in SQL, explicitly reference columns in the INSERT INTO and SELECT clauses. Avoid SELECT * FROM. You can dynamically generate SQL with current columns without hard-coding their listings.
Ensure the format of your specified text file in query aligns with actual output. Current chunk implementation shows attempt to import a CSV file (default of DataFrame.to_csv) as tab-delimited data file.
Save a schema.ini file in the same directory as the temporary text file for Access DB engine to define format and even individual column data types.
```
[my_temp_data.csv]

Format=CSVDelimited
ColNameHeader=True
MaxScanRows=500
```
Be sure to scan enough rows as I found, Access will scan first few rows and assume decimal point numbers are integer columns with similar challenges with many NULLs in first few rows.

I have used above method to migrate 2-8+ million rows from CSV files into MS Access database tables within minutes!

Altogether in Python using pandas and pyodbc. Below assumes table columns match exactly the same as data frame columns. Please note the SQL string of embedded F-strings without comma separators is compliant in Python.

# Read the .txt file and prepare the dataframe
df = pd.read_csv(file_path, delimiter="\t", encoding="utf-8")

# Adjust your data as needed (like column cleaning, etc.)
column_specs = get_column_specs(table_name)
df_adjusted = adjust_data_to_specs(df, column_specs)

# Export full adjusted data to disk as CSV with headers
df_cols = df_adjusted.columns.tolist()
df_adjusted.to_csv(temp_file_path, index=False)
        
# Prepare SQL query using schema.ini to define format
sql = (
    f"INSERT INTO [{table_name}] "
    f"({ ', '.join(df_cols) }) "
    f"SELECT {', '.join(df_cols)} "
    f"FROM [text;database={os.path.dirname(temp_file_path)}]."
    f"[{os.path.basename(temp_file_path)}]"
)

# Execute SQL query
cursor.execute(sql)
conn.commit()

Muhammad Ikhwan Perwira · Accepted Answer · 2024-12-19 16:59:26Z

0

Resolved with calling vba in python:

def run_macro(macro_name):
    try:
        access_app = win32com.client.Dispatch('Access.Application')
        access_app.Visible = False
        access_app.OpenCurrentDatabase(access_db_path)
        access_app.DoCmd.RunMacro(macro_name)
        access_app.Quit()
    except Exception as e:
        print(f"An error occurred: {e}")

# Call the function
run_macro("Macro")

edited Dec 19, 2024 at 16:59

Muhammad Ikhwan Perwira

1,2521 gold badge13 silver badges39 bronze badges

answered Dec 19, 2024 at 0:16

user28814433

11 bronze badge

1 Comment

Parfait Dec 19, 2024 at 14:59

To be clear, you are not calling VBA here. Instead, like VBA does, you are interfacing to the MS Access Object Library in Python using the Command Object Model (COM). Other languages can do the same: Java, C#, Vb.Net, PowerShell, Perl, PHP, R, etc.

Collectives™ on Stack Overflow

Avoid file sharing lock in MS Access during bulk insert

2 Answers 2

Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related