0

I have a Python script written by my friend for text substitution which works in his system Ubuntu Focal.

The following is the script:

#!/usr/bin/env python3
"""
Script Name: replace_text.py
Purpose: This Python script performs text substitution in files within a given directory.
It replaces specific characters as per predefined substitutions, providing a convenient way to modify text files.

Usage:
python replace_text.py /path/to/your/directory

Note:
- Ensure you have Python installed on your system.
- The script processes all files within the specified directory and its subdirectories.
- Files are modified in-place, so have a backup if needed.
"""

import os
import sys

def replace_text_in_files(directory):
    # Character substitutions
    substitutions = {
        '': 'fi',
        '': 'fl',
        'ä': 'ā',
        'é': 'ī',
        'ü': 'ū',
        'å': 'ṛ',
        'è': 'ṝ',
        'ì': 'ṅ',
        'ñ': 'ṣ',
        'ï': 'ñ',
        'ö': 'ṭ',
        'ò': 'ḍ',
        'ë': 'ṇ',
        'ç': 'ś',
        'à': 'ṁ',
        'ù': 'ḥ',
        'ÿ': 'ḷ',
        'û': 'ḹ',
        'Ä': 'Ā',
        'É': 'Ī',
        'Ü': 'Ū',
        'Å': 'Ṛ',
        'È': 'Ṝ',
        'Ì': 'Ṅ',
        'Ñ': 'Ṣ',
        'Ï': 'Ñ',
        'Ö': 'Ṭ',
        'Ò': 'Ḍ',
        'Ë': 'Ṇ',
        'Ç': 'Ś',
        'À': 'Ṁ',
        'Ù': 'Ḥ',
        'ß': 'Ḷ',
        '“': '“',
        '”': '”',
        ' ': ' ',
        '‘': '‘',
        '–': '-',
        '’': '’',
        '—': '—',
        '•': '»',
        '…': '...',
    }

    # Walk through the directory and its subdirectories
    for root, dirs, files in os.walk(directory):
        for file_name in files:
            file_path = os.path.join(root, file_name)
            with open(file_path, 'r', encoding='utf-8') as file:
                file_content = file.read()
            
            # Perform substitutions
            for original, replacement in substitutions.items():
                file_content = file_content.replace(original, replacement)

            # Write the modified content back to the file
            with open(file_path, 'w', encoding='utf-8') as file:
                file.write(file_content)

if __name__ == "__main__":
    if len(sys.argv) != 2:
        print("Usage: python replace_text.py /path/to/your/directory")
        sys.exit(1)

    directory_path = sys.argv[1]
    replace_text_in_files(directory_path)
    print("Text substitution completed successfully.")

I Devuan Daedalus which is based on Debian 12 but without systemd. Upon running this script on my machine, I get the following error:

~/Documents/software-related/software-files$ python3 replace_text.py ~/Desktop/test-dir/
Traceback (most recent call last):
  File "/home/vrgovinda/Documents/software-related/software-files/replace_text.py", line 89, in <module>
    replace_text_in_files(directory_path)
  File "/home/vrgovinda/Documents/software-related/software-files/replace_text.py", line 73, in replace_text_in_files
    file_content = file.read()
                   ^^^^^^^^^^^
  File "<frozen codecs>", line 322, in decode
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xad in position 41: invalid start byte

He doesn't have any clue about this. And I know nothing about python. Hence I seek help of those who are knowledgeable in this forum.

I took the suggestion by Ofer Sadan to open the file as a bytes file. But that gives me another error:

 binary mode doesn't take an encoding argument

Please ask for additional information if:

  1. This question seems too vague/open-ended/generic.
  2. I haven't provided sufficient information.

Thanks,

4
  • 1
    It’s hard to guess what’s happening without looking at the file. But if you want to open the file as binary, you need to delete the encoding=“utf-8” argument: with open(path, “rb”) as file: Commented Nov 18, 2023 at 20:29
  • Oh yes. I tried this script on .doc and .docx files which aren't UTF-8 encoded I guess. When I did try the script on plain text files, the script works flawlessly. Commented Nov 18, 2023 at 21:19
  • BTW, if I delete the encoding=“utf-8” argument and use with open(path, “rb”) as file: will this script work on .doc and .docx? Commented Nov 18, 2023 at 21:22
  • No, the script would not work on .doc and .docx files directly. Those are compressed files and the plain text characters don't exist in them directly. P.S. the fact that those were the files you were trying to process should have been part of the question from the start. Commented Nov 19, 2023 at 14:20

2 Answers 2

0

Python complains because while reading one of the files as bytes converted to utf-8 characters. It comes to a point where a byte is not a valid utf-8 character. Are you sure this file is actually a utf-8 encoded file ? https://www.charset.org/utf-8

Trying to read the file as binary will give you the actual bytes, but you want to substitute characters. then you would have to convert the bytes to a string with utf-8 codec I guess, and you would end up with the same error.

I would be extra careful in your case (backup) you are maybe trying to temper an actual binary file. are you sure the file you are touching is meant to be modified ?

Sign up to request clarification or add additional context in comments.

2 Comments

but the same script with the same files works in Ubuntu Focal
maybe you could add logging to know the incriminated file ? And also I would like to know how files were copied from one system to another ?
0

I had run the script on .doc and .docx files which aren't UTF-8 encoded I guess. When I did try the script on plain text files, the script works flawlessly.

Sorry if I have wasted your time.

Thanks for your contributions @frederic-laurencin and @elbashmubarmeg

1 Comment

.docx files are a group of XML files wrapped in a .zip. If you unzipped it, some of those individual files could be processed as text. I don't know what encoding they would be in though.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.