1

I'm trying to read some files from HDFS and/or from the filesystem and I get this exception

    Driver stacktrace:]
            [unread block data]
    ]org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 6, C-4073.CM.ES, executor 1): java.lang.IllegalStateException: unread block data
    at java.io.ObjectInputStream$BlockDataInputStream.setBlockDataMode(ObjectInputStream.java:2421)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1382)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
    at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:76)
    at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:115)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:222)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)

The code reading from HDFS:

    JavaRDD<String> textFile = sc.textFile("file:///PathToFile");

The code reading from the filesystem:

    JavaRDD<String> textFile=sc.textFile("hdfs:///PathToFile");

I've been searching, and users usually said that it might be an error due to different Java versions, but I've checked it:

My Cluster:

$ java -version
java version "1.7.0_67"
Java(TM) SE Runtime Environment (build 1.7.0_67-b01)
Java HotSpot(TM) 64-Bit Server VM (build 24.65-b04, mixed mode)

My local machine:

$ java -version
java version "1.7.0_76"
Java(TM) SE Runtime Environment (build 1.7.0_76-b13)
Java HotSpot(TM) 64-Bit Server VM (build 24.76-b04, mixed mode)

My pom.xml:

<properties>
    <jdk.version>1.7</jdk.version>
    <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
    <project.reporting.outputEncoding>UTF-8</project.reporting.outputEncoding>
    <springboot.version>1.5.3.RELEASE</springboot.version>
    <springint.version>4.3.10.RELEASE</springint.version>
    <cdh.version>5.10.1</cdh.version>
    <solr.version>4.10.3-cdh${cdh.version}</solr.version>
    <hbase.version>1.2.0-cdh${cdh.version}</hbase.version>
    <kafka.version>0.9.0-kafka-2.0.2</kafka.version>
    <rt-framework.version>2.3.5</rt-framework.version>
    <tas.version>4.0.0</tas.version>
</properties>

I'm not sure if my problem is about the Java versions, because I've no problem writing/reading from kafka, or querying from hive.

Thanks you in advance and sorry for my bad english.

2
  • Are you using the same version of Java to serialize and deserialize your objects? Commented Oct 18, 2017 at 13:58
  • Yes, I've got same Java versions everywhere, but in this example, I'm not even serializing or deserializing anything, just trying to read the files. Commented Oct 18, 2017 at 14:10

2 Answers 2

1

Ok, those two lines are super-different.

JavaRDD<String> textFile = sc.textFile("file:///PathToFile");
JavaRDD<String> textFile=sc.textFile("hdfs:///PathToFile");

First line ("file:///...") assumes that your file is available on all the machines under the same location and that those files are actually exactly the same. Otherwise all kind of creepy stuff will happen during partitioning/reading.

Second line means that you try to read from preconfigured HDFS and it is actually OK.

If you want to read some local file on master machine just do something like this:

List<String> myData = ...
JavaRDD<String> myRdd = sc.parallelize(myData);

More details are available here: https://spark.apache.org/docs/2.2.0/api/java/org/apache/spark/SparkContext.html#parallelize-scala.collection.Seq-int-scala.reflect.ClassTag-

Sign up to request clarification or add additional context in comments.

1 Comment

Hi! I know both lines were different, I just tried to show that I got that error no matter where I'm trying to read just to show it isnt just an HDFS problem, in a productive enviroment, I'll read from HDFS. I will edit my question so it will be clear in that point. But thank you so much for the answer.
0

Finnally, I found the solution, it seems it was a problem with dependencies, specifically in the compressed jar:

/META-INF/spring.factories

I'd to mannually add all dependencies, I'm not sure if this is the best solution, but at least, it works.

This is my new spring.factories:

    #beans ----------------------------------------------------------------------------------------------------------------------
    org.springframework.beans.BeanInfoFactory=org.springframework.beans.ExtendedBeanInfoFactory

    #boot  -----------------------------------------------------------------------------------------------------------------------
    # PropertySource Loaders
    org.springframework.boot.env.PropertySourceLoader=\
    org.springframework.boot.env.PropertiesPropertySourceLoader,\
    org.springframework.boot.env.YamlPropertySourceLoader

    # Run Listeners
    org.springframework.boot.SpringApplicationRunListener=\
    org.springframework.boot.context.event.EventPublishingRunListener

    # Application Context Initializers
    org.springframework.context.ApplicationContextInitializer=\
    org.springframework.boot.context.ConfigurationWarningsApplicationContextInitializer,\
    org.springframework.boot.context.ContextIdApplicationContextInitializer,\
    org.springframework.boot.context.config.DelegatingApplicationContextInitializer,\
    org.springframework.boot.context.embedded.ServerPortInfoApplicationContextInitializer

    # Application Listeners
    org.springframework.context.ApplicationListener=\
    org.springframework.boot.ClearCachesApplicationListener,\
    org.springframework.boot.builder.ParentContextCloserApplicationListener,\
    org.springframework.boot.context.FileEncodingApplicationListener,\
    org.springframework.boot.context.config.AnsiOutputApplicationListener,\
    org.springframework.boot.context.config.ConfigFileApplicationListener,\
    org.springframework.boot.context.config.DelegatingApplicationListener,\
    org.springframework.boot.liquibase.LiquibaseServiceLocatorApplicationListener,\
    org.springframework.boot.logging.ClasspathLoggingApplicationListener,\
    org.springframework.boot.logging.LoggingApplicationListener

    # Environment Post Processors
    org.springframework.boot.env.EnvironmentPostProcessor=\
    org.springframework.boot.cloud.CloudFoundryVcapEnvironmentPostProcessor,\
    org.springframework.boot.env.SpringApplicationJsonEnvironmentPostProcessor

    # Failure Analyzers
    org.springframework.boot.diagnostics.FailureAnalyzer=\
    org.springframework.boot.diagnostics.analyzer.BeanCurrentlyInCreationFailureAnalyzer,\
    org.springframework.boot.diagnostics.analyzer.BeanNotOfRequiredTypeFailureAnalyzer,\
    org.springframework.boot.diagnostics.analyzer.BindFailureAnalyzer,\
    org.springframework.boot.diagnostics.analyzer.ConnectorStartFailureAnalyzer,\
    org.springframework.boot.diagnostics.analyzer.NoUniqueBeanDefinitionFailureAnalyzer,\
    org.springframework.boot.diagnostics.analyzer.PortInUseFailureAnalyzer,\
    org.springframework.boot.diagnostics.analyzer.ValidationExceptionFailureAnalyzer

    # FailureAnalysisReporters
    org.springframework.boot.diagnostics.FailureAnalysisReporter=\
    org.springframework.boot.diagnostics.LoggingFailureAnalysisReporter




    # boot autoconfigure ---------------------------------------------------------------------------------------------------------------------
    # Initializers
    org.springframework.context.ApplicationContextInitializer=\
    org.springframework.boot.autoconfigure.SharedMetadataReaderFactoryContextInitializer,\
    org.springframework.boot.autoconfigure.logging.AutoConfigurationReportLoggingInitializer

    # Application Listeners
    org.springframework.context.ApplicationListener=\
    org.springframework.boot.autoconfigure.BackgroundPreinitializer

    # Auto Configuration Import Listeners
    org.springframework.boot.autoconfigure.AutoConfigurationImportListener=\
    org.springframework.boot.autoconfigure.condition.ConditionEvaluationReportAutoConfigurationImportListener

    # Auto Configuration Import Filters
    org.springframework.boot.autoconfigure.AutoConfigurationImportFilter=\
    org.springframework.boot.autoconfigure.condition.OnClassCondition

    # Auto Configure
    org.springframework.boot.autoconfigure.EnableAutoConfiguration=\
    org.springframework.boot.autoconfigure.admin.SpringApplicationAdminJmxAutoConfiguration,\
    org.springframework.boot.autoconfigure.aop.AopAutoConfiguration,\
    org.springframework.boot.autoconfigure.amqp.RabbitAutoConfiguration,\
    org.springframework.boot.autoconfigure.batch.BatchAutoConfiguration,\
    org.springframework.boot.autoconfigure.cache.CacheAutoConfiguration,\
    org.springframework.boot.autoconfigure.cassandra.CassandraAutoConfiguration,\
    org.springframework.boot.autoconfigure.cloud.CloudAutoConfiguration,\
    org.springframework.boot.autoconfigure.context.ConfigurationPropertiesAutoConfiguration,\
    org.springframework.boot.autoconfigure.context.MessageSourceAutoConfiguration,\
    org.springframework.boot.autoconfigure.context.PropertyPlaceholderAutoConfiguration,\
    org.springframework.boot.autoconfigure.couchbase.CouchbaseAutoConfiguration,\
    org.springframework.boot.autoconfigure.dao.PersistenceExceptionTranslationAutoConfiguration,\
    org.springframework.boot.autoconfigure.data.cassandra.CassandraDataAutoConfiguration,\
    org.springframework.boot.autoconfigure.data.cassandra.CassandraRepositoriesAutoConfiguration,\
    org.springframework.boot.autoconfigure.data.couchbase.CouchbaseDataAutoConfiguration,\
    org.springframework.boot.autoconfigure.data.couchbase.CouchbaseRepositoriesAutoConfiguration,\
    org.springframework.boot.autoconfigure.data.elasticsearch.ElasticsearchAutoConfiguration,\
    org.springframework.boot.autoconfigure.data.elasticsearch.ElasticsearchDataAutoConfiguration,\
    org.springframework.boot.autoconfigure.data.elasticsearch.ElasticsearchRepositoriesAutoConfiguration,\
    org.springframework.boot.autoconfigure.data.jpa.JpaRepositoriesAutoConfiguration,\
    org.springframework.boot.autoconfigure.data.ldap.LdapDataAutoConfiguration,\
    org.springframework.boot.autoconfigure.data.ldap.LdapRepositoriesAutoConfiguration,\
    org.springframework.boot.autoconfigure.data.mongo.MongoDataAutoConfiguration,\
    org.springframework.boot.autoconfigure.data.mongo.MongoRepositoriesAutoConfiguration,\
    org.springframework.boot.autoconfigure.data.neo4j.Neo4jDataAutoConfiguration,\
    org.springframework.boot.autoconfigure.data.neo4j.Neo4jRepositoriesAutoConfiguration,\
    org.springframework.boot.autoconfigure.data.solr.SolrRepositoriesAutoConfiguration,\
    org.springframework.boot.autoconfigure.data.redis.RedisAutoConfiguration,\
    org.springframework.boot.autoconfigure.data.redis.RedisRepositoriesAutoConfiguration,\
    org.springframework.boot.autoconfigure.data.rest.RepositoryRestMvcAutoConfiguration,\
    org.springframework.boot.autoconfigure.data.web.SpringDataWebAutoConfiguration,\
    org.springframework.boot.autoconfigure.elasticsearch.jest.JestAutoConfiguration,\
    org.springframework.boot.autoconfigure.freemarker.FreeMarkerAutoConfiguration,\
    org.springframework.boot.autoconfigure.gson.GsonAutoConfiguration,\
    org.springframework.boot.autoconfigure.h2.H2ConsoleAutoConfiguration,\
    org.springframework.boot.autoconfigure.hateoas.HypermediaAutoConfiguration,\
    org.springframework.boot.autoconfigure.hazelcast.HazelcastAutoConfiguration,\
    org.springframework.boot.autoconfigure.hazelcast.HazelcastJpaDependencyAutoConfiguration,\
    org.springframework.boot.autoconfigure.info.ProjectInfoAutoConfiguration,\
    org.springframework.boot.autoconfigure.integration.IntegrationAutoConfiguration,\
    org.springframework.boot.autoconfigure.jackson.JacksonAutoConfiguration,\
    org.springframework.boot.autoconfigure.jdbc.DataSourceAutoConfiguration,\
    org.springframework.boot.autoconfigure.jdbc.JdbcTemplateAutoConfiguration,\
    org.springframework.boot.autoconfigure.jdbc.JndiDataSourceAutoConfiguration,\
    org.springframework.boot.autoconfigure.jdbc.XADataSourceAutoConfiguration,\
    org.springframework.boot.autoconfigure.jdbc.DataSourceTransactionManagerAutoConfiguration,\
    org.springframework.boot.autoconfigure.jms.JmsAutoConfiguration,\
    org.springframework.boot.autoconfigure.jmx.JmxAutoConfiguration,\
    org.springframework.boot.autoconfigure.jms.JndiConnectionFactoryAutoConfiguration,\
    org.springframework.boot.autoconfigure.jms.activemq.ActiveMQAutoConfiguration,\
    org.springframework.boot.autoconfigure.jms.artemis.ArtemisAutoConfiguration,\
    org.springframework.boot.autoconfigure.flyway.FlywayAutoConfiguration,\
    org.springframework.boot.autoconfigure.groovy.template.GroovyTemplateAutoConfiguration,\
    org.springframework.boot.autoconfigure.jersey.JerseyAutoConfiguration,\
    org.springframework.boot.autoconfigure.jooq.JooqAutoConfiguration,\
    org.springframework.boot.autoconfigure.kafka.KafkaAutoConfiguration,\
    org.springframework.boot.autoconfigure.ldap.embedded.EmbeddedLdapAutoConfiguration,\
    org.springframework.boot.autoconfigure.ldap.LdapAutoConfiguration,\
    org.springframework.boot.autoconfigure.liquibase.LiquibaseAutoConfiguration,\
    org.springframework.boot.autoconfigure.mail.MailSenderAutoConfiguration,\
    org.springframework.boot.autoconfigure.mail.MailSenderValidatorAutoConfiguration,\
    org.springframework.boot.autoconfigure.mobile.DeviceResolverAutoConfiguration,\
    org.springframework.boot.autoconfigure.mobile.DeviceDelegatingViewResolverAutoConfiguration,\
    org.springframework.boot.autoconfigure.mobile.SitePreferenceAutoConfiguration,\
    org.springframework.boot.autoconfigure.mongo.embedded.EmbeddedMongoAutoConfiguration,\
    org.springframework.boot.autoconfigure.mongo.MongoAutoConfiguration,\
    org.springframework.boot.autoconfigure.mustache.MustacheAutoConfiguration,\
    org.springframework.boot.autoconfigure.orm.jpa.HibernateJpaAutoConfiguration,\
    org.springframework.boot.autoconfigure.reactor.ReactorAutoConfiguration,\
    org.springframework.boot.autoconfigure.security.SecurityAutoConfiguration,\
    org.springframework.boot.autoconfigure.security.SecurityFilterAutoConfiguration,\
    org.springframework.boot.autoconfigure.security.FallbackWebSecurityAutoConfiguration,\
    org.springframework.boot.autoconfigure.security.oauth2.OAuth2AutoConfiguration,\
    org.springframework.boot.autoconfigure.sendgrid.SendGridAutoConfiguration,\
    org.springframework.boot.autoconfigure.session.SessionAutoConfiguration,\
    org.springframework.boot.autoconfigure.social.SocialWebAutoConfiguration,\
    org.springframework.boot.autoconfigure.social.FacebookAutoConfiguration,\
    org.springframework.boot.autoconfigure.social.LinkedInAutoConfiguration,\
    org.springframework.boot.autoconfigure.social.TwitterAutoConfiguration,\
    org.springframework.boot.autoconfigure.solr.SolrAutoConfiguration,\
    org.springframework.boot.autoconfigure.thymeleaf.ThymeleafAutoConfiguration,\
    org.springframework.boot.autoconfigure.transaction.TransactionAutoConfiguration,\
    org.springframework.boot.autoconfigure.transaction.jta.JtaAutoConfiguration,\
    org.springframework.boot.autoconfigure.validation.ValidationAutoConfiguration,\
    org.springframework.boot.autoconfigure.web.DispatcherServletAutoConfiguration,\
    org.springframework.boot.autoconfigure.web.EmbeddedServletContainerAutoConfiguration,\
    org.springframework.boot.autoconfigure.web.ErrorMvcAutoConfiguration,\
    org.springframework.boot.autoconfigure.web.HttpEncodingAutoConfiguration,\
    org.springframework.boot.autoconfigure.web.HttpMessageConvertersAutoConfiguration,\
    org.springframework.boot.autoconfigure.web.MultipartAutoConfiguration,\
    org.springframework.boot.autoconfigure.web.ServerPropertiesAutoConfiguration,\
    org.springframework.boot.autoconfigure.web.WebClientAutoConfiguration,\
    org.springframework.boot.autoconfigure.web.WebMvcAutoConfiguration,\
    org.springframework.boot.autoconfigure.websocket.WebSocketAutoConfiguration,\
    org.springframework.boot.autoconfigure.websocket.WebSocketMessagingAutoConfiguration,\
    org.springframework.boot.autoconfigure.webservices.WebServicesAutoConfiguration,\
    com.MY.MAIN.CLASS


    # Failure analyzers
    org.springframework.boot.diagnostics.FailureAnalyzer=\
    org.springframework.boot.autoconfigure.diagnostics.analyzer.NoSuchBeanDefinitionFailureAnalyzer,\
    org.springframework.boot.autoconfigure.jdbc.DataSourceBeanCreationFailureAnalyzer,\
    org.springframework.boot.autoconfigure.jdbc.HikariDriverConfigurationFailureAnalyzer

    # Template availability providers
    org.springframework.boot.autoconfigure.template.TemplateAvailabilityProvider=\
    org.springframework.boot.autoconfigure.freemarker.FreeMarkerTemplateAvailabilityProvider,\
    org.springframework.boot.autoconfigure.mustache.MustacheTemplateAvailabilityProvider,\
    org.springframework.boot.autoconfigure.groovy.template.GroovyTemplateAvailabilityProvider,\
    org.springframework.boot.autoconfigure.thymeleaf.ThymeleafTemplateAvailabilityProvider,\
    org.springframework.boot.autoconfigure.web.JspTemplateAvailabilityProvider

Probabilly it doesn't needs all the dependencies, so I will check it and update my answer in the future.

Thank you all for the answers.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.