Given the burgeoning interest in Hadoop and data analytics in general, it’s unsurprising that IT vendors and developers would turn to ways to speed up the process of sorting and gaining insights from data. Enter “Drill,” a new open-source project proposed via the Apache Software Foundation’s incubator wing.
“There is a strong need in the market for low-latency interactive analysis of large-scale datasets, including nested data (eg, JSON, Avro, Protocol Buffers),” read the proposal submitted for the project. “This need was identified by Google and addressed internally with a system called Dremel.”
Over the past few years, more open-source frameworks emerged to help data analysts and IT departments with scalable batch processing. Of these, Apache Hadoop emerged as the favorite of many organizations needing to crunch massive amounts of data. But in the eyes of Drill’s creators, Hadoop’s design prevents it from achieving “the sub-second latency needed for interactive data analysis and exploration.”
Drill, they added, “is intended to address this need.”
Drill’s architecture centers on four components: support for a variety of languages and programming models, including DrQL (used by Dremel and Google BigQuery), Mongo Query Language, and Plume; a low-latency distributed execution engine capable of efficiently querying petabytes of data on 10,000 servers; a layer for supporting schema-based and schema-less formats such as JSON (in the latter case) and Protocol Buffers/Dremel; and a layer supporting various data sources, with an initial focus on “Hadoop as a data source.”
Drill will eventually support encryption on the wire, which is not considered one of the project’s initial goals.
“Significant work” has apparently been done to identify Drill’s initial requirements and system architecture, with implementation of those four components offered as the next step. Although there’s a growing need for tools capable of large-dataset analysis (look at the buzz around Hadoop), Drill’s creators acknowledge that any project of this scope carries inherent risks: vendors deciding to change their strategies around data analytics could doom the project, although that scenario seems unlikely thanks to the aforementioned interest.
The proposal seeks to downplay other potential dangers, including excessive reliance on salaried developers (“we are confident that the project will continue even if no salaried developers contribute to the project”) and relationships with other Apache products (“we look forward to collaborating with those communities, as well as other Apache communities”). Initial workers on the project include employees of MapR Technologies, Drawn to Scale, and Concurrent, with mentors from MapR Technologies, Lucid Imagination and Nokia.
http://slashdot.org/topic/bi/apache-drill-could-power-faster-through-data/
相关推荐
Learning Apache Drill Queryand Analyze Distributed Data Sources with SQL
使用Apache Drill技术
https://www.tutorialspoint.com/apache_drill/apache_drill_pdf_version.htm https://www.tutorialspoint.com/apache_drill/apache_drill_pdf_version.htm ...
Learning Apache Drill 2019 最新版
基于1.18版本整理了Apache Drill SQL中的常用函数
Learning Apache Drill
第一部关于Apache Drill的技术文档,该文档属于作者自行整理文档资料,如果误差请谅解。
Apache Drill是一个分布式MPP查询层,支持针对NoSQL和Hadoop数据存储系统SQL和替代查询语言。 它的部分灵感来自 。 开发者 请阅读以设置和运行Apache Drill。 有关完整的开发人员文档,请参见 更多信息 请参阅或以...
一组用于处理Internet域名的Apache Drill UDF UDFs 有一个UDF: suffix_extract(domain-string) :给定一个有效的互联网域名(FQDN或其他方式),这将返回一个地图的领域tld , assigned , subdomain和hostname的...
Drill_PTH_Through.DRL
Drill_NPTH_Through.DRL
apache-drill-jdbc-plugin 适用于Apache Drill的JDBC插件 下载Apache Drill 0.9。 将代码添加到contrib中,然后用此文件夹中的pom文件替换现有的pom文件。 用mvn构建。 要仅生成软件包,请使用与以下类似的符号:...
一个Apache Drill UDF,用于通过 Java库处理Twitter tweet文本。 UDFs tw_parse_tweet(string) :解析tweet文本并返回具有以下命名值的地图列: weightedLength :(整数)tweet的总长度,其中代码点按配置文件中...
Apache Drill UDF用于检索和使用HTML文本 基于库。 注意:这绝对是一个在制品。 UDFs soup_read_html(url-string, timeout-ms) :此UDF要求网络可到达预期的URL目标。 给定一个URL和一个连接超时(以毫秒为单位)...
query ( "SELECT * FROM dfs.`home/<USERNAME>/apache-drill-<VERSION>/sample-data/region.parquet` WHERE R_NAME = 'AFRICA'" ). then ( function ( resdata ) { console . log ( resdata ) ;} ). catch ( funct
用于 Apache Drill 的 Ruby 客户端 安装 首先, 。 对于 Homebrew,请使用: brew install apache-drill drill-embedded 并将这一行添加到您的应用程序的 Gemfile 中: gem 'drill-sergeant' 如何使用 创建...
数据整合处理的工具,apache-drill
cb-钻Apache Drill的存储插件
网址工具一组用于URL的Apache Drill UDF 它使用 Java库进行解析。UDFs 包括以下UDF: url_parse(url-string) :输入URL / URI字符串后,将在地图中返回一组字段( url , scheme , username , password , host ,...