The underlying JVM object is a SchemaRDD, not a PythonRDD, so we can 
  utilize the relational query api exposed by SparkSQL.
  This class receives raw tuples from Java but assigns a class to it in 
  all its data-collection methods (mapPartitionsWithIndex, collect, take, 
  etc) so that PySpark sees them as Row objects with named fields.
    |  | 
        
          | __init__(self,
        jschema_rdd,
        sql_ctx) x.__init__(...) initializes x; see help(type(x)) for signature
 | source code |  | 
    |  |  | 
    |  |  | 
    |  |  | 
    |  | 
        
          | insertInto(self,
        tableName,
        overwrite=False) Inserts the contents of this SchemaRDD into the specified table.
 | source code |  | 
    |  | 
        
          | saveAsTable(self,
        tableName) Creates a new table with the contents of this SchemaRDD.
 | source code |  | 
    |  |  | 
    |  | 
        
          | schemaString(self) Returns the output schema in the tree format.
 | source code |  | 
    |  | 
        
          | printSchema(self) Prints out the schema in the tree format.
 | source code |  | 
    |  |  | 
    |  |  | 
    |  | 
        
          | mapPartitionsWithIndex(self,
        f,
        preservesPartitioning=False) Return a new RDD by applying a function to each partition of this 
      RDD, while tracking the index of the original partition.
 | source code |  | 
    |  | 
        
          | cache(self) Persist this RDD with the default storage level 
      (
 MEMORY_ONLY_SER). | source code |  | 
    |  | 
        
          | persist(self,
        storageLevel) Set this RDD's storage level to persist its values across operations 
      after the first time it is computed.
 | source code |  | 
    |  | 
        
          | unpersist(self,
        blocking=True) Mark the RDD as non-persistent, and remove all blocks for it from 
      memory and disk.
 | source code |  | 
    |  |  | 
    |  |  | 
    |  |  | 
    |  | 
        
          | coalesce(self,
        numPartitions,
        shuffle=False) Return a new RDD that is reduced into `numPartitions` partitions.
 | source code |  | 
    |  |  | 
    |  |  | 
    |  |  | 
    |  | 
        
          | subtract(self,
        other,
        numPartitions=None) Return each value in
 selfthat is not contained inother. | source code |  | 
  
    | Inherited from rdd.RDD:__add__,__repr__,aggregate,aggregateByKey,cartesian,cogroup,collectAsMap,combineByKey,context,countByKey,countByValue,filter,first,flatMap,flatMapValues,fold,foldByKey,foreach,foreachPartition,getNumPartitions,getStorageLevel,glom,groupBy,groupByKey,groupWith,histogram,id,join,keyBy,keys,leftOuterJoin,map,mapPartitions,mapPartitionsWithSplit,mapValues,max,mean,min,name,partitionBy,pipe,reduce,reduceByKey,reduceByKeyLocally,rightOuterJoin,sample,sampleByKey,sampleStdev,sampleVariance,saveAsHadoopDataset,saveAsHadoopFile,saveAsNewAPIHadoopDataset,saveAsNewAPIHadoopFile,saveAsPickleFile,saveAsSequenceFile,saveAsTextFile,setName,sortBy,sortByKey,stats,stdev,subtractByKey,sum,take,takeOrdered,takeSample,toDebugString,top,union,values,variance,zip,zipWithIndex,zipWithUniqueId Inherited from object:__delattr__,__format__,__getattribute__,__hash__,__new__,__reduce__,__reduce_ex__,__setattr__,__sizeof__,__str__,__subclasshook__ |