Rust语言Tantivy全文检索引擎删除、更新文档经验总结

作者:原创 发布时间:2024-09-29

tantivy-logo.png

        Tantivy是用Rust语言编写的高性能全文检索引擎,比Java写的Lucene检索效率更快,在用Tantivy别写更新文档时,需要先删除掉原有文档,再新增文档,然而,调用如下代码进行删除文档时发现无法删除:

    let id=schema.get_field("id").unwrap();
    let mut index_writer: IndexWriter=get_writer();
    let term: Term = Term::from_field_u64(id, id_value);

        经过一天多的排查,发现id字段仅是进行了存储,而未对其设置索引所致,写贴出正确代码:

    let mut schema_builder = Schema::builder();
    schema_builder.add_u64_field("id", NumericOptions::default().set_indexed().set_stored());
    schema_builder.add_text_field("title", TextOptions::default()
                .set_indexing_options(
                    TextFieldIndexing::default()
                        .set_tokenizer("jieba")
                        .set_index_option(IndexRecordOption::WithFreqsAndPositions),
                )
                .set_stored());
    schema_builder.add_text_field("content", TextOptions::default()
            .set_indexing_options(
                TextFieldIndexing::default()
                    .set_tokenizer("jieba")
                    .set_index_option(IndexRecordOption::WithFreqsAndPositions),
            )
            .set_stored());
       
    let schema = schema_builder.build();
    let index=Index::create_in_dir(p, schema.clone()).unwrap();
    //注册中文分词
    let tokenizer = tantivy_jieba::JiebaTokenizer {};
    index.tokenizers().register("jieba", tokenizer);
    return  index;

        上边代码第二行中set_indexed()是关键,只有设置字段可索引才能通过该字段进行删除文档,否则文档不起任何作用。

        以下是新建、更新、删除文档的正确代码:

//单条数据新建索引
pub async fn create_article_index(article_id: u64){
    let schema=INDEX.schema();

    let id = schema.get_field("id").unwrap();
    let title = schema.get_field("title").unwrap();
    let content = schema.get_field("content").unwrap();

    let mut index_writer: IndexWriter=get_writer();
   
    //从数据库读取
    let res=orm::get_article(article_id.try_into().unwrap()).await;
    if let Ok(list) = res {
        for item in list {
            let mut t_title="".to_string();
            let mut t_content="".to_string();
            if let Some(t) = item.title.clone() {
                t_title=t;
            }
            if let Some(t) = item.content.clone() {
                t_content=t;
            }
            let ids:u64 = item.id.try_into().unwrap();
            index_writer.add_document(doc!(
                id => ids,
                title => t_title,
                content => t_content
                )).unwrap();
        }
       
        index_writer.commit().unwrap();
    }
}
//更新单条数据索引
pub async fn update_article_index(id: u64){
    del_article_index(id);
    create_article_index(id).await;
}
//删除单条数据索引
pub fn del_article_index(idu: u64){
    let schema=INDEX.schema();
    let id=schema.get_field("id").unwrap();
    let mut index_writer: IndexWriter=get_writer();

    let term: Term = Term::from_field_u64(id, idu);
    // let term: Term = Term::from_field_text(title, "978-9176370711");
    println!("删除索引term成功:{:?}",term);
    let del_res=index_writer.delete_term(term.clone());
    println!("删除结果:{}",del_res);
    let ssss = index_writer.commit().unwrap();
    println!("删除索引ID成功:{},{}",idu,ssss);
}

//初始化全局Index
lazy_static!{
    static ref INDEX:Index = get_index();
}
fn get_index()->Index{
    let _ = fs::create_dir("article_index");
    let p = Path::new("article_index");
    // let path = Path::new("article_index");
    let index_res= Index::open_in_dir(p);
   
    match index_res {
        Ok(index)=>{
            //注册中文分词
            let tokenizer = tantivy_jieba::JiebaTokenizer {};
            index.tokenizers().register("jieba", tokenizer);
            return index;
        },
        Err(e)=>{
            println!("错误----:{}",e);
            let mut schema_builder = Schema::builder();
            schema_builder.add_u64_field("id", NumericOptions::default().set_indexed().set_stored());
            schema_builder.add_text_field("title", TextOptions::default()
                .set_indexing_options(
                    TextFieldIndexing::default()
                        .set_tokenizer("jieba")
                        .set_index_option(IndexRecordOption::WithFreqsAndPositions),
                )
                .set_stored());
            schema_builder.add_text_field("content", TextOptions::default()
            .set_indexing_options(
                TextFieldIndexing::default()
                    .set_tokenizer("jieba")
                    .set_index_option(IndexRecordOption::WithFreqsAndPositions),
            )
            .set_stored());
       
            let schema = schema_builder.build();
            // let init_dir = tantivy::directory::MmapDirectory::open("article_index").unwrap();
            let index=Index::create_in_dir(p, schema.clone()).unwrap();
            //注册中文分词
            let tokenizer = tantivy_jieba::JiebaTokenizer {};
            index.tokenizers().register("jieba", tokenizer);
            return  index;
        }
    }
}
【声明】:LifeAdd生活方式登载该文章目的是为更广泛的传递行业信息,不代表赞同其观点或证实其描述,本网站亦不为其版权负责。若无意侵犯您合法权益的内容,请联系本网站,核实后将立即予以删除!

高品质生活资讯平台

『LifeAdd生活方式』是一家引领高端品质生活的互联网平台,聚焦前沿时尚风潮,为高品质人群推送生活资讯和商业信息,链接高端商业与高端人群生态圈。

业务范围

经营规则

联系我们
北京市丰台丽泽金融商务区平安幸福中心A座7层
82918700@qq.com
微信号:82918700

2013-2024 LifeAdd生活方式 www.lifeadd.cn 版权所有 | 御融(北京)科技有限公司 All Rights Reserved
增值电信业务经营许可证:京B2-20200664 | 京ICP备14004911号-7