1041

How can I find all the files in a directory having the extension .txt in python?

0

25 Answers 25

3180

You can use glob:

import glob, os
os.chdir("/mydir")
for file in glob.glob("*.txt"):
    print(file)

or simply os.listdir:

import os
for file in os.listdir("/mydir"):
    if file.endswith(".txt"):
        print(os.path.join("/mydir", file))

or if you want to traverse directory, use os.walk:

import os
for root, dirs, files in os.walk("/mydir"):
    for file in files:
        if file.endswith(".txt"):
             print(os.path.join(root, file))
Sign up to request clarification or add additional context in comments.

2 Comments

chdir is not just unnecessary, it's highly undesirable since a function call is not supposed to irreversibly modify the current working directory. This may cause a lot of downstream errors.
while this might be a bug in an extremely niche case, discovered python 3.13.0 glob() and listdir() fail to find any files in macOS (15.1.1) inside squashfuse (0.5.2) w mountpoint created inside a SMB share.
368

Use glob.

>>> import glob
>>> glob.glob('./*.txt')
['./outline.txt', './pip-log.txt', './test.txt', './testingvim.txt']

2 Comments

Beware that glob can't find files recursively if your python is under 3.5. more inform
This only finds files in the current top level directory, not within the entire directory.
241

You can simply use pathlibs glob:

import pathlib

list(pathlib.Path('your_directory').glob('*.txt'))

or in a loop:

for txt_file in pathlib.Path('your_directory').glob('*.txt'):
    # do something with "txt_file"

If you want it recursive you can use rglob('*.txt')

This is like calling Path.glob() with “**/” added in front of the given relative pattern

4 Comments

Note that you can also use rglob if you want to look for items recursively. E.g. .rglob('*.txt')
what if we want to find all files except precisely those that end in .txt?
@gota get all files and then exclude the txt files set(pathlib.Path('.').iterdir()) - set(pathlib.Path().glob("*.txt"))
In this day and age, this should be the preferred solution. pathlib is so much more ergonomic than mucking about with os.
224

Something like that should do the job

for root, dirs, files in os.walk(directory):
    for file in files:
        if file.endswith('.txt'):
            print(file)

2 Comments

Note that this is case sensitive (won't match .TXT or .Txt), so you'll probably want to do if file.lower().endswith('.txt'):
can't this be abbreviated to something like the following? for file in os.walk(directory).files: if file.endswith ...
163

Something like this will work:

>>> import os
>>> path = '/usr/share/cups/charmaps'
>>> text_files = [f for f in os.listdir(path) if f.endswith('.txt')]
>>> text_files
['euc-cn.txt', 'euc-jp.txt', 'euc-kr.txt', 'euc-tw.txt', ... 'windows-950.txt']

1 Comment

You could use os.path.join on each element of text_files. It could be something like text_files = [os.path.join(path, f) for f in os.listdir(path) if f.endswith('.txt')].
56
import os

path = 'mypath/path' 
files = os.listdir(path)

files_txt = [i for i in files if i.endswith('.txt')]

Comments

37

I like os.walk():

import os

for root, dirs, files in os.walk(dir):
    for f in files:
        if os.path.splitext(f)[1] == '.txt':
            fullpath = os.path.join(root, f)
            print(fullpath)

Or with generators:

import os

fileiter = (os.path.join(root, f)
    for root, _, files in os.walk(dir)
    for f in files)
txtfileiter = (f for f in fileiter if os.path.splitext(f)[1] == '.txt')
for txt in txtfileiter:
    print(txt)

Comments

35

Here's more versions of the same that produce slightly different results:

glob.iglob()

import glob
for f in glob.iglob("/mydir/*/*.txt"): # generator, search immediate subdirectories 
    print f

glob.glob1()

print glob.glob1("/mydir", "*.tx?")  # literal_directory, basename_pattern

fnmatch.filter()

import fnmatch, os
print fnmatch.filter(os.listdir("/mydir"), "*.tx?") # include dot-files

Comments

29

Try this this will find all your files recursively:

import glob, os
os.chdir("H:\\wallpaper")# use whatever directory you want

#double\\ no single \

for file in glob.glob("**/*.txt", recursive = True):
    print(file)

Comments

26

Python v3.5+

Fast method using os.scandir in a recursive function. Searches for all files with a specified extension in folder and sub-folders. It is fast, even for finding 10,000s of files.

I have also included a function to convert the output to a Pandas Dataframe.

import os
import re
import pandas as pd
import numpy as np


def findFilesInFolderYield(path,  extension, containsTxt='', subFolders = True, excludeText = ''):
    """  Recursive function to find all files of an extension type in a folder (and optionally in all subfolders too)

    path:               Base directory to find files
    extension:          File extension to find.  e.g. 'txt'.  Regular expression. Or  'ls\d' to match ls1, ls2, ls3 etc
    containsTxt:        List of Strings, only finds file if it contains this text.  Ignore if '' (or blank)
    subFolders:         Bool.  If True, find files in all subfolders under path. If False, only searches files in the specified folder
    excludeText:        Text string.  Ignore if ''. Will exclude if text string is in path.
    """
    if type(containsTxt) == str: # if a string and not in a list
        containsTxt = [containsTxt]
    
    myregexobj = re.compile('\.' + extension + '$')    # Makes sure the file extension is at the end and is preceded by a .
    
    try:   # Trapping a OSError or FileNotFoundError:  File permissions problem I believe
        for entry in os.scandir(path):
            if entry.is_file() and myregexobj.search(entry.path): # 
    
                bools = [True for txt in containsTxt if txt in entry.path and (excludeText == '' or excludeText not in entry.path)]
    
                if len(bools)== len(containsTxt):
                    yield entry.stat().st_size, entry.stat().st_atime_ns, entry.stat().st_mtime_ns, entry.stat().st_ctime_ns, entry.path
    
            elif entry.is_dir() and subFolders:   # if its a directory, then repeat process as a nested function
                yield from findFilesInFolderYield(entry.path,  extension, containsTxt, subFolders)
    except OSError as ose:
        print('Cannot access ' + path +'. Probably a permissions error ', ose)
    except FileNotFoundError as fnf:
        print(path +' not found ', fnf)

def findFilesInFolderYieldandGetDf(path,  extension, containsTxt, subFolders = True, excludeText = ''):
    """  Converts returned data from findFilesInFolderYield and creates and Pandas Dataframe.
    Recursive function to find all files of an extension type in a folder (and optionally in all subfolders too)

    path:               Base directory to find files
    extension:          File extension to find.  e.g. 'txt'.  Regular expression. Or  'ls\d' to match ls1, ls2, ls3 etc
    containsTxt:        List of Strings, only finds file if it contains this text.  Ignore if '' (or blank)
    subFolders:         Bool.  If True, find files in all subfolders under path. If False, only searches files in the specified folder
    excludeText:        Text string.  Ignore if ''. Will exclude if text string is in path.
    """
    
    fileSizes, accessTimes, modificationTimes, creationTimes , paths  = zip(*findFilesInFolderYield(path,  extension, containsTxt, subFolders))
    df = pd.DataFrame({
            'FLS_File_Size':fileSizes,
            'FLS_File_Access_Date':accessTimes,
            'FLS_File_Modification_Date':np.array(modificationTimes).astype('timedelta64[ns]'),
            'FLS_File_Creation_Date':creationTimes,
            'FLS_File_PathName':paths,
                  })
    
    df['FLS_File_Modification_Date'] = pd.to_datetime(df['FLS_File_Modification_Date'],infer_datetime_format=True)
    df['FLS_File_Creation_Date'] = pd.to_datetime(df['FLS_File_Creation_Date'],infer_datetime_format=True)
    df['FLS_File_Access_Date'] = pd.to_datetime(df['FLS_File_Access_Date'],infer_datetime_format=True)

    return df

ext =   'txt'  # regular expression 
containsTxt=[]
path = 'C:\myFolder'
df = findFilesInFolderYieldandGetDf(path,  ext, containsTxt, subFolders = True)

Comments

22

path.py is another alternative: https://github.com/jaraco/path.py

from path import path
p = path('/path/to/the/directory')
for f in p.files(pattern='*.txt'):
    print f

Comments

22

To get all '.txt' file names inside 'dataPath' folder as a list in a Pythonic way:

from os import listdir
from os.path import isfile, join
path = "/dataPath/"
onlyTxtFiles = [f for f in listdir(path) if isfile(join(path, f)) and  f.endswith(".txt")]
print onlyTxtFiles

Comments

19

Python has all tools to do this:

import os

the_dir = 'the_dir_that_want_to_search_in'
all_txt_files = filter(lambda x: x.endswith('.txt'), os.listdir(the_dir))

Comments

16

I did a test (Python 3.6.4, W7x64) to see which solution is the fastest for one folder, no subdirectories, to get a list of complete file paths for files with a specific extension.

To make it short, for this task os.listdir() is the fastest and is 1.7x as fast as the next best: os.walk() (with a break!), 2.7x as fast as pathlib, 3.2x faster than os.scandir() and 3.3x faster than glob.
Please keep in mind, that those results will change when you need recursive results. If you copy/paste one method below, please add a .lower() otherwise .EXT would not be found when searching for .ext.

import os
import pathlib
import timeit
import glob

def a():
    path = pathlib.Path().cwd()
    list_sqlite_files = [str(f) for f in path.glob("*.sqlite")]

def b(): 
    path = os.getcwd()
    list_sqlite_files = [f.path for f in os.scandir(path) if os.path.splitext(f)[1] == ".sqlite"]

def c():
    path = os.getcwd()
    list_sqlite_files = [os.path.join(path, f) for f in os.listdir(path) if f.endswith(".sqlite")]

def d():
    path = os.getcwd()
    os.chdir(path)
    list_sqlite_files = [os.path.join(path, f) for f in glob.glob("*.sqlite")]

def e():
    path = os.getcwd()
    list_sqlite_files = [os.path.join(path, f) for f in glob.glob1(str(path), "*.sqlite")]

def f():
    path = os.getcwd()
    list_sqlite_files = []
    for root, dirs, files in os.walk(path):
        for file in files:
            if file.endswith(".sqlite"):
                list_sqlite_files.append( os.path.join(root, file) )
        break



print(timeit.timeit(a, number=1000))
print(timeit.timeit(b, number=1000))
print(timeit.timeit(c, number=1000))
print(timeit.timeit(d, number=1000))
print(timeit.timeit(e, number=1000))
print(timeit.timeit(f, number=1000))

Results:

# Python 3.6.4
0.431
0.515
0.161
0.548
0.537
0.274

Comments

13

To get a list of ".txt" file names from a folder called "data" in the same directory I usually use this simple line of code:

import os
fileNames = [fileName for fileName in os.listdir("data") if fileName.endswith(".txt")]

1 Comment

Respect for the one-liner!
9
import os
import sys 

if len(sys.argv)==2:
    print('no params')
    sys.exit(1)

dir = sys.argv[1]
mask= sys.argv[2]

files = os.listdir(dir); 

res = filter(lambda x: x.endswith(mask), files); 

print res

Comments

7

This code makes my life simpler.

import os
fnames = ([file for root, dirs, files in os.walk(dir)
    for file in files
    if file.endswith('.txt') #or file.endswith('.png') or file.endswith('.pdf')
    ])
for fname in fnames: print(fname)

Comments

6

A copy-pastable solution similar to the one of ghostdog:

def get_all_filepaths(root_path, ext):
    """
    Search all files which have a given extension within root_path.

    This ignores the case of the extension and searches subdirectories, too.

    Parameters
    ----------
    root_path : str
    ext : str

    Returns
    -------
    list of str

    Examples
    --------
    >>> get_all_filepaths('/run', '.lock')
    ['/run/unattended-upgrades.lock',
     '/run/mlocate.daily.lock',
     '/run/xtables.lock',
     '/run/mysqld/mysqld.sock.lock',
     '/run/postgresql/.s.PGSQL.5432.lock',
     '/run/network/.ifstate.lock',
     '/run/lock/asound.state.lock']
    """
    import os
    all_files = []
    for root, dirs, files in os.walk(root_path):
        for filename in files:
            if filename.lower().endswith(ext):
                all_files.append(os.path.join(root, filename))
    return all_files

You can also use yield to create a generator and thus avoid assembling the complete list:

def get_all_filepaths(root_path, ext):
    import os
    for root, dirs, files in os.walk(root_path):
        for filename in files:
            if filename.lower().endswith(ext):
                yield os.path.join(root, filename)

Comments

5

Use fnmatch: https://docs.python.org/2/library/fnmatch.html

import fnmatch
import os

for file in os.listdir('.'):
    if fnmatch.fnmatch(file, '*.txt'):
        print file

Comments

4

I suggest you to use fnmatch and the upper method. In this way you can find any of the following:

  1. Name.txt;
  2. Name.TXT;
  3. Name.Txt

.

import fnmatch
import os

    for file in os.listdir("/Users/Johnny/Desktop/MyTXTfolder"):
        if fnmatch.fnmatch(file.upper(), '*.TXT'):
            print(file)

Comments

4

Here's one with extend()

types = ('*.jpg', '*.png')
images_list = []
for files in types:
    images_list.extend(glob.glob(os.path.join(path, files)))

Comments

3

Functional solution with sub-directories:

from fnmatch import filter
from functools import partial
from itertools import chain
from os import path, walk

print(*chain(*(map(partial(path.join, root), filter(filenames, "*.txt")) for root, _, filenames in walk("mydir"))))

Comments

2

In case the folder contains a lot of files or memory is an constraint, consider using generators:

def yield_files_with_extensions(folder_path, file_extension):
   for _, _, files in os.walk(folder_path):
       for file in files:
           if file.endswith(file_extension):
               yield file

Option A: Iterate

for f in yield_files_with_extensions('.', '.txt'): 
    print(f)

Option B: Get all

files = [f for f in yield_files_with_extensions('.', '.txt')]

Comments

1

use Python OS module to find files with specific extension.

the simple example is here :

import os

# This is the path where you want to search
path = r'd:'  

# this is extension you want to detect
extension = '.txt'   # this can be : .jpg  .png  .xls  .log .....

for root, dirs_list, files_list in os.walk(path):
    for file_name in files_list:
        if os.path.splitext(file_name)[-1] == extension:
            file_name_path = os.path.join(root, file_name)
            print file_name
            print file_name_path   # This is the full path of the filter file

Comments

1

Many users have replied with os.walk answers, which includes all files but also all directories and subdirectories and their files.

import os


def files_in_dir(path, extension=''):
    """
       Generator: yields all of the files in <path> ending with
       <extension>

       \param   path       Absolute or relative path to inspect,
       \param   extension  [optional] Only yield files matching this,

       \yield              [filenames]
    """


    for _, dirs, files in os.walk(path):
        dirs[:] = []  # do not recurse directories.
        yield from [f for f in files if f.endswith(extension)]

# Example: print all the .py files in './python'
for filename in files_in_dir('./python', '*.py'):
    print("-", filename)

Or for a one off where you don't need a generator:

path, ext = "./python", ext = ".py"
for _, _, dirfiles in os.walk(path):
    matches = (f for f in dirfiles if f.endswith(ext))
    break

for filename in matches:
    print("-", filename)

If you are going to use matches for something else, you may want to make it a list rather than a generator expression:

    matches = [f for f in dirfiles if f.endswith(ext)]

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.