Golang client for Impala
Because Apache Impala driver for Golang does not support Kerberos authenticathion yet, we needed to find an workaround. Such a difficult task (:
There are two most used ports that clients are using to connect to Impala : 21000 (Beeswax ) and 21050 (HiveServer2).
Kerberos
There are some options when using Kerberos authentication, but the simplest way are to manualy generate a Kerberos ticket before connecting to Impala. In production we are actually running kinit before openning any connection to Impala.
The first thing is to install Kerberos, then get the krb5.conf and the keytab files and then generate the ticket :
sudo apt install krb5-user kinit -V my.user /tmp/my.user.keytab klist
Impalathing
Golang impalathing is implementing thrift interface and connects to impala 21000 port .
Wen used in production we had some issues: it was executing insert queries for 10-15 minutes and then it failed with ” write tcp x.x.x.x:13196->x.x.x.x:21000: write: broken pipe” .
We found that the big datateam cannot allocate more resources on that port, so we started to search another solution that connects to the hive port.
conn, err := impalathing.Connect( "impala.example.com", 21000, func() impalathing.Option { return func(o *impalathing.Options) { o.SaslTransportConfig = map[string]string{ "mechanismName": "GSSAPI", "service": "impala", } o.ConnectionTimeout = 1 } }(), )
Gohive
Gohive is implementing thrift intrerface and connects to impala 21050 port.
This package is created for “Spark Distributed SQL Engine” and we had to adapt it to work with Impala.
You can run queries synchronously or asynchronously.
It works smoothly in production. In synchronous mode, when the query poool is full, it hangs out until the query is processed, so you will have to take this in consideration.
configuration := NewConnectConfiguration() configuration.Service = "impala" configuration.TLSConfig = &tls.Config{} // just activated ssl, we dit not needed any certs configuration.HiveConfiguration = map[string]string{ "request_pool": "prod.pool", } connection, errConn := Connect("impala.example.com", 21050, "KERBEROS", configuration)
We ended using Gohive, and it proved to be reliable in production environment.