public class VectorizedArrowReader extends java.lang.Object implements VectorizedReader<VectorHolder>
VectorReader(s)
that read in a batch of values into Arrow vectors. It
also takes care of allocating the right kind of Arrow vectors depending on the corresponding
Iceberg/Parquet data types.Modifier and Type | Class and Description |
---|---|
static class |
VectorizedArrowReader.ConstantVectorReader<T>
A Dummy Vector Reader which doesn't actually read files, instead it returns a dummy
VectorHolder which indicates the constant value which should be used for this column.
|
static class |
VectorizedArrowReader.DeletedVectorReader
A Dummy Vector Reader which doesn't actually read files.
|
Modifier and Type | Field and Description |
---|---|
static int |
DEFAULT_BATCH_SIZE |
Constructor and Description |
---|
VectorizedArrowReader(org.apache.parquet.column.ColumnDescriptor desc,
Types.NestedField icebergField,
org.apache.arrow.memory.BufferAllocator ra,
boolean setArrowValidityVector) |
Modifier and Type | Method and Description |
---|---|
void |
close()
Release any resources allocated.
|
static VectorizedArrowReader |
nulls() |
static VectorizedArrowReader |
positions() |
static VectorizedArrowReader |
positionsWithSetArrowValidityVector() |
VectorHolder |
read(VectorHolder reuse,
int numValsToRead)
Reads a batch of type @param <T> and of size numRows
|
void |
setBatchSize(int batchSize) |
void |
setRowGroupInfo(org.apache.parquet.column.page.PageReadStore source,
java.util.Map<org.apache.parquet.hadoop.metadata.ColumnPath,org.apache.parquet.hadoop.metadata.ColumnChunkMetaData> metadata,
long rowPosition)
Sets the row group information to be used with this reader
|
java.lang.String |
toString() |
public static final int DEFAULT_BATCH_SIZE
public VectorizedArrowReader(org.apache.parquet.column.ColumnDescriptor desc, Types.NestedField icebergField, org.apache.arrow.memory.BufferAllocator ra, boolean setArrowValidityVector)
public void setBatchSize(int batchSize)
setBatchSize
in interface VectorizedReader<VectorHolder>
public VectorHolder read(VectorHolder reuse, int numValsToRead)
VectorizedReader
read
in interface VectorizedReader<VectorHolder>
reuse
- container for the last batch to be reused for next batchnumValsToRead
- number of rows to readpublic void setRowGroupInfo(org.apache.parquet.column.page.PageReadStore source, java.util.Map<org.apache.parquet.hadoop.metadata.ColumnPath,org.apache.parquet.hadoop.metadata.ColumnChunkMetaData> metadata, long rowPosition)
VectorizedReader
setRowGroupInfo
in interface VectorizedReader<VectorHolder>
source
- row group information for all the columnsmetadata
- map of ColumnPath
-> ColumnChunkMetaData
for the row grouprowPosition
- the row group's row offset in the parquet filepublic void close()
VectorizedReader
close
in interface VectorizedReader<VectorHolder>
public java.lang.String toString()
toString
in class java.lang.Object
public static VectorizedArrowReader nulls()
public static VectorizedArrowReader positions()
public static VectorizedArrowReader positionsWithSetArrowValidityVector()