Here's a quick example I quickly whipped up just now.
import pyspark.ml as ml
import pyspark.ml.feature as ft
import pyspark.ml.classification as cl
data = sc.parallelize([
(1, 'two', 3.4, 0)
,(2, 'four', 9.1, 1)
,(3, 'one', 2.1, 0)
,(4, 'five', 2.6, 0)
]).toDF(['id', 'counter', 'continuous', 'result'])
si = ft.StringIndexer(inputCol='counter', outputCol='counter_idx')
ohe = ft.OneHotEncoder(inputCol=si.getOutputCol(), outputCol='counter_enc')
va = ft.VectorAssembler(inputCols=['counter_enc', 'continuous'], outputCol='features')
lr = cl.LogisticRegression(maxIter=5, featuresCol='features', labelCol='result')
pip = ml.Pipeline(stages=[si, ohe, va, lr])
pip.fit(data).transform(data).select(data.columns+['probability', 'prediction']).show()
You can also check the notebooks to Denny's and my book: https://github.com/drabastomek/learningPySpark/blob/master/Chapter06/LearningPySpark_Chapter06.ipynb
Hope this helps.