4

I would like to query an UTF-8 encoded CSV file using VBA in Excel 2010 with the following database connection:

provider=Microsoft.Jet.OLEDB.4.0;;data source='xyz';Extended Properties="text;HDR=Yes;FMT=Delimited(,);CharacterSet=65001"

All CSV files start with the BOM \xEF\xBB\xBF and the header line. Somehow the BOM isn't recognized correctly and the first column header gets read as "?header_name", i.e. a question mark gets prepended. I have tried different CharacterSets and I have also tried to use Microsoft.ACE.OLEDB.12.0, but everything was without success so far.

Is this a known bug or is there any way to get the right first column header name without changing the encoding of the source files?

3
  • Would you mind sharing your UTF-8 encoded CSV file Commented Nov 23, 2015 at 15:31
  • @EEM Every simple csv file like a,b,c\n 0.1,0.2,0.3\n with \xEF\xBB\xBF in the beginning has the same problem. Commented Nov 24, 2015 at 14:39
  • 1) I'm curios about reasons for only Microsoft.Jet.OLEDB.4.0 and 2) Is Connection:="TEXT;Path & Filename" not applicable at all? Commented Nov 24, 2015 at 20:23

2 Answers 2

5
+50

The following procedure extracts the entire CSVfile into a new Sheet, clearing the BOM from the Header. It has the Path, Filename and BOM string as variables to provide flexibility.

Use this procedure to call the Query procedure

Sub Qry_Csv_Utf8()
Const kFile As String = "UTF8 .csv"
Const kPath As String = "D:\StackOverFlow\Temp\"
Const kBOM As String = "\xEF\xBB\xBF"
    Call Ado_Qry_Csv(kPath, kFile, kBOM)
End Sub

This is the Query procedure

Sub Ado_Qry_Csv(sPath As String, sFile As String, sBOM As String)
Dim Wsh As Worksheet
Dim AdoConnect As ADODB.Connection
Dim AdoRcrdSet As ADODB.Recordset
Dim i As Integer

    Rem Add New Sheet - Select option required
    'With ThisWorkbook           'Use this if procedure is resident in workbook receiving csv data
    'With Workbooks(WbkName)     'Use this if procedure is not in workbook receiving csv data
    With ActiveWorkbook         'I used this for testing purposes
        Set Wsh = .Sheets.Add(After:=.Sheets(.Sheets.Count))
        'Wsh.Name = NewSheetName        'rename new Sheet
    End With

    Set AdoConnect = New ADODB.Connection
    AdoConnect.Open "Provider=Microsoft.Jet.OLEDB.4.0;" & _
        "Data Source=" & sPath & ";" & _
        "Extended Properties='text;HDR=Yes;FMT=Delimited(,);CharacterSet=65001'"

    Set AdoRcrdSet = New ADODB.Recordset
    AdoRcrdSet.Open Source:="SELECT * FROM [" & sFile & "]", _
        ActiveConnection:=AdoConnect, _
        CursorType:=adOpenDynamic, _
        LockType:=adLockReadOnly, _
        Options:=adCmdText

    Rem Enter Csv Records in Worksheet
    For i = 0 To -1 + AdoRcrdSet.Fields.Count
        Wsh.Cells(1, 1 + i).Value = _
            WorksheetFunction.Substitute(AdoRcrdSet.Fields(i).Name, sBOM, "")
    Next
    Wsh.Cells(2, 1).CopyFromRecordset AdoRcrdSet

End Sub
Sign up to request clarification or add additional context in comments.

1 Comment

+1 CopyFromRecordset is good possibility! Works well with comma delimited csv-file. I could not get it work with semicolon delimited file because the Delimited(;) seems not to accept anything else but comma. But since the OP uses comma it is Ok.
2

The only solution for this problem I found is to use Schema.ini file.

my test csv file

Col_A;Col_B;Col_C
Some text example;123456789;3,14

enter image description here

Schema.ini for my test csv file

[UTF-8_Csv_With_BOM.csv] 
Format=Delimited(;)
Col1=Col_A Text
Col2=Col_B Long
Col3=Col_C Double

This Schema.ini file contains the name of the source csv file and describes my columns. Each column is specified by its name and type but you can specify more informations. This file must be located in the same folder as your csv file. More info here.

Finally the VBA code which reads the csv file. Note that HDR=No. This is because the columns headers are defined in the Schema.ini.

' Add reference to Microsoft ActiveX Data Objects 6.1 Library
Sub ReadCsv()

    Const filePath As String = "c:\Temp\StackOverflow\"
    Const fileName As String = "UTF-8_Csv_With_BOM.csv"
    Dim conn As ADODB.Connection
    Dim rs As New ADODB.Recordset

    Set conn = New ADODB.Connection
    conn.Open "Provider=Microsoft.Jet.OLEDB.4.0;Data Source='" & filePath & _
        "';Extended Properties='text;HDR=No;FMT=Delimited()';"

    With rs
        .ActiveConnection = conn
        .Open "SELECT * FROM [" & fileName & "]"
        If Not .BOF And Not .EOF Then
            While (Not .EOF)
                Debug.Print rs.Fields("Col_A") & " " & _
                            rs.Fields("Col_B") & " " & _
                            rs.Fields("Col_C")
                .MoveNext
            Wend
        End If
        .Close
    End With

    conn.Close
    Set conn = Nothing

End Sub

Output

Some text example 123456789 3,14

1 Comment

Thanks, I will have a look into it. However, I think it means I would have to programmatically generate the schema.ini for every different csv-file I would like to process.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.