4

Simple thing: I want to be able to break builds via checkstyle, if selected files (.java, .xml) are not encoded properly (I want to enforce UTF-8 in source files).

I'm currently using checkstyle for a number of other build-breakers, like enforcing correct LineFeeds and/or usage of Tab Characters, but there does not seem to be something like a FileEncodingChecker.

Question: If checkstyle simply can't do this: is there another plugin that might do this job?

1 Answer 1

3

Maven encoding (sources and resources) is handled by the standard project.build.sourceEncoding property, which indeed should be present and set to the UTF-8 value, as a good practice.
From official documentation of the maven-resources-plugin

The best practice is to define encoding for copying filtered resources via the property ${project.build.sourceEncoding} which should be defined in the pom properties section

This property is picked up as default value of the encoding property of the maven-compiler-plugin and the encoding property of the maven-resources-plugin.


To further enforce its presence, you could then use the maven-enforcer-plugin and its requireProperty rule, in order to enforce the existence of the project.build.sourceEncoding property and its value at UTF-8. That is, the build would fail if the property was not set AND did not have this exact value.

Below an example of such a configuration, to add to your pom.xml file, build/plugins section:

<plugin>
    <groupId>org.apache.maven.plugins</groupId>
    <artifactId>maven-enforcer-plugin</artifactId>
    <version>1.4.1</version>
    <executions>
        <execution>
            <id>enforce-property</id>
            <goals>
                <goal>enforce</goal>
            </goals>
            <configuration>
                <rules>
                    <requireProperty>
                        <property>project.build.sourceEncoding</property>
                        <message>Encoding must be set and at UTF-8!</message>
                        <regex>UTF-8</regex>
                        <regexMessage>Encoding must be set and at UTF-8</regexMessage>
                    </requireProperty>
                </rules>
                <fail>true</fail>
            </configuration>
        </execution>
    </executions>
</plugin>

Note, the same could be done for the project.reporting.outputEncoding property.


Further reading on Stack Overflow:


Bonus: since we are on Stack Overflow, the CEO would probably be happy to see his old article back again: The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets


Test
Given the following Java code:

package com.sample;

public class Main {

    public void 漢字() {
    }

}

and setting the following in Maven:

<properties>
    <project.build.sourceEncoding>US-ASCII</project.build.sourceEncoding>
</properties>

Would actually make the build fail, since US-ASCII is 7 bits and woudl result in illegal character errors. The same would not happen for UTF-8, which makes uses of 8 bits instead.

Sign up to request clarification or add additional context in comments.

2 Comments

@a-di-matteo: Thanks for this extensive answer, works perfectly for me. I'm still feeling a bit unhappy about using two plugins for achieving some (IMHO) simple build breaking scenario.
I am tried replacing regex/properties with ISO-8859-1 also added plugin in root pom. As i want to enforce this encoding on my project, it didn't work. I am having multi-module project, am i missing something

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.