pyspark

Version

com.microsoft.ml.spark:mmlspark_2.11:jar:0.18.1
spark= 2.4.3
scala=2.11.12

data (csv with header) https://gist.github.com/ttpro1995/69051647a256af912803c9a16040f43a

download data and save as csv file, put into folder /data/public/HIGGS/higgs.test.predictioncsv

val data = spark.read.option("header","true").option("inferSchema", "true").csv("/data/public/HIGGS

Hello everyone,
Recently I tried to set up petastorm on my company's hadoop cluster.
However as the cluster uses Kerberos for authentication using petastorm failed.
I figured out that petastorm relies on pyarrow which actually supports kerberos authentication.

I hacked "petastorm/petastorm/hdfs/namenode.py" line 250
and replaced it with

driver = 'libhdfs'
return pyarrow.hdfs.c

pyspark

Here are 1,043 public repositories matching this topic...

Azure / mmlspark

JohnSnowLabs / spark-nlp

WeBankFinTech / Linkis

jadianes / spark-py-notebooks

awesome-spark / awesome-spark

ironmussa / Optimus

uber / petastorm

jupyter-incubator / sparkmagic

WeBankFinTech / Scriptis

AlexIoannides / pyspark-example-project

ericxiao251 / spark-syntax

HariSekhon / DevOps-Python-tools

ekampf / PySpark-Boilerplate

awesome-spark / spark-gotchas

CamDavidsonPilon / tdigest

Morphl-AI / MorphL-Community-Edition

paypal / gimel

XD-DENG / Spark-practice

MrPowers / quinn

Azure / azure-cosmosdb-spark

titicaca / spark-iforest

dvgodoy / handyspark

awantik / pyspark-learning

runawayhorse001 / LearningApacheSpark

RubensZimbres / Repo-2019

commoncrawl / cc-pyspark

tirthajyoti / Spark-with-Python

mahmoudparsian / big-data-mapreduce-course

wadhwasahil / Relation_Extraction

archivesunleashed / aut

Improve this page

Add this topic to your repo