1

Below code prints an Array of fileNames.

  val pdfFileArray = getFiles()
  for(fileName <- pdfFileArray){
    println(fileName)
  }

I'm trying to convert this Array (pdfFileArray) into an array which contains unique file name extensions.

Is something like below the correct way of doing this in scala ?

  Set<String> fileNameSet = new HashSet<String>
  val pdfFileArray = getFiles()
  for(fileName <- pdfFileArray){
    String extension = fileName.substring(fileName.lastIndexOf('.'));
    fileNameSet.add(extension)
  }

5 Answers 5

2

This will properly handle files with no extension (by ignoring them)

val extensions = getFiles().map{_.split('.').tail.lastOption}.flatten.distinct

so

Array("foo.jpg", "bar.jpg", "baz.png", "foobar")

becomes

Array("jpg", "png")
Sign up to request clarification or add additional context in comments.

3 Comments

I think this wouldn't work. split returns an array of one element (the whole string) if it doesn't find the separator. So lastOption will always be a Some. That means you'd get Array("jpg", "png", "foobar")
ah you're right, however .tail.lastOption does work, I'll edit my answer.
@Dan, better to use flatMap: files.flatMap(_.split('.').tail.lastOption).distinct
1

You could do this:

val fileNameSet = pdfFileArray.groupBy(_.split('.').last).keys

This assumes that all you filenames will have an extension and you only want the last extension. i.e. something.html.erb has the extension 'erb'

1 Comment

thanks but I think your code needs to be amended slightly : pdfFileArray.groupBy(_.getName().split('.').last).keys
1

There's a method in scala's collection called distinct, which takes away all duplicate entries in the collection. So for instance:

scala> List(1, 2, 3, 1, 2).distinct
res3: List[Int] = List(1, 2, 3)

Is that what you're looking for?

2 Comments

That will give you distinct by whole filenames, how about extensions?
Map the array by taking the extensions and then distinct: array.map(_.lastIndexOf('.')).distinct.
1

For a sake of completeness:

List("foo.jpg", "bar.jpg").map(_.takeRight(3)).toSet

Here I'm assuming that all extensions are 3 chars long. Conversion to Set, just like .distinct method (which uses mutable set underneath, by the way) in other answers gives you unique items.

2 Comments

why does it matter if Set "uses mutable set underneath" ?
@user470184, well, this part actually relates to .distinct, not Set, sorry, if I confused you
1

You can also do it with regex, which gives a more general solution because you can redefine the expression to match anything you want:

val R = """.*\.(.+)""".r
getFiles.collect{ case R(x) => x }.distinct

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.