Producing useful commands on the go using C++ and AI

Producing useful commands on the go using C++ and AI

Daniel Lemire's blog
A few weeks ago, I attended a software engineering seminar focused on the role of large language models in programming. The distinguished software engineering professors in attendance were notably skeptical, frequently dismissing the technology’s potential. During a break, I turned to a senior professor, a longtime colleague, and playfully remarked, “ChatGPT writes better C++ than you!” The comment was partly in jest, given our years of camaraderie, but it carried a kernel of conviction. I genuinely believe that these models, while limited, demonstrate remarkable proficiency in generating code, challenging traditional views on programming expertise.



As technologists, we often rely on small scripts to manipulate data or automate repetitive tasks. Like many software developers, I typically use high-level languages such as Python, awk, or Bash for these purposes, as they are well-suited for the job.

However, I wanted to explore whether recent advancements in generative AI could shift this approach. To test this, I turned to Grok, a competitor to ChatGPT, and tasked it with a small challenge: write a concise program in C++23 (compatible with GCC 13, LLVM 18, or later) to parse simple CSV files, identify column labels, and count the number of distinct values in each column. A CSV (Comma-Separated Values) file is a plain text format used to store tabular data, where each line represents a row and columns are separated by commas.
Name,Age,City
Alice,30,New York
Bob,25,London
Charlie,12,Paris
David,40,Longueuil
David,35,Tokyo
I got back the following answer which is, as far as I can tell correct. It will not support fancy CSV files, with quotes and escaped characters, but that is fine.
// assume clang++ 18 or g++ 13
// compile with: c++ -std=c++23 -o csv csv.cpp
#include <cstdlib>
#include <fstream>
#include <iostream>
#include <map>
#include <print>
#include <ranges>
#include <sstream>
#include <string>
#include <unordered_set>

int main(int argc, char *argv[]) {
  if (argc != 2) {
    std::println("Usage: {} &LTcsv_file>", argv[0]);
    return EXIT_FAILURE;
  }
  std::ifstream file(argv[1]);
  if (!file.is_open()) {
    std::println("Error: Cannot open file {}\n", argv[1]);
    return EXIT_FAILURE;
  }
  // Read first line for column labels
  std::string line;
  if (!std::getline(file, line)) {
    std::println("Error: Empty file {}", argv[1]);
    return EXIT_FAILURE;
  }

  // Parse first line into labels
  std::map<size_t, std::string> labels;
  auto cells = line | std::ranges::views::split(',');
  for (auto cell : cells) {
    std::string label(cell.begin(), cell.end());
    labels[labels.size()] = label;
  }
  std::map<std::string, std::unordered_set<std::string>> columns;
  while (std::getline(file, line)) {
    auto cells = line | std::ranges::views::split(',');
    for (auto [idx, cell] : std::ranges::views::enumerate(cells)) {
      columns[labels[idx]].insert(std::string(cell.begin(), cell.end()));
    }
  }
  // Print results using labels
  for (const auto &[label, values] : columns) {
    std::println("Column {}: {} distinct values", label, values.size());
  }
  return EXIT_SUCCESS;
}

This C++ program leverages modern C++ features to parse a CSV file and count distinct values in each column, making extensive use of the Ranges library introduced in C++20. It uses for file input, reading the CSV file specified via command-line arguments (argc, argv). The first line is parsed into a std::map&LTsize_t, std::string> to store column labels, associating each index with its label. The std::ranges::views::split range adaptor splits lines by commas, transforming a string into a range of substrings, which is more composable and expressive than traditional string tokenization. Similarly, std::ranges::views::enumerate pairs each cell with its index during iteration, enabling clean access to both the cell value and its corresponding column label without manual index tracking. These range-based operations exemplify functional-style programming, allowing lazy evaluation and pipeline-like data processing. A std::map&LTstd::string, std::unordered_set> tracks distinct values per column, where each label maps to an std::unordered_set for efficient uniqueness checking. The program reads subsequent lines, populating the sets with cell values, and uses std::println (C++23) for formatted output of each column’s label and distinct value count. Error handling ensures valid file access and non-empty files, returning EXIT_SUCCESS or EXIT_FAILURE accordingly. By using ranges, the code achieves greater readability and abstraction, reducing boilerplate compared to traditional iterator-based loops.

I am not suggesting you abandon Python for C++—this was purely an experimental dive into AI-driven code generation. That said, I’m genuinely impressed by the results. However, without a solid understanding of C++, these AI tools offer limited value, as the ability to critically assess and refine the output remains essential.

 

Generated by RSStT. The copyright belongs to the original author.

Source

Report Page