Documentation/How Tos/Removing Duplicate Rows

From Apache OpenOffice Wiki
Jump to: navigation, search

Removing Duplicate Rows in Calc

Q: How do I select all unique values from a column in an Apache OpenOffice Calc spreadsheet? I want to remove all redundant rows from a report with thousands of records.

A: There is no automatic function to remove duplicate rows. Follow these steps to delete all rows that have duplicate values in column A:

  1. Select all cells of the current data range.
    On most systems, you can click any cell inside the data range, then press  Ctrl  +  *  key on the numeric keypad.
  2. Sort the data range by column A.
    Choose Data → Sort.
  3. Click an empty cell in the first row. Let's assume it is cell C1. Enter the formula:
    This will display 1 if the current row has the same value in column A as the next row. It will display 0 if the values are different.
  4. Copy the formula down for all rows of the data range.
    Drag the lower right edge of the cell C1 down to the last row.
  5. Now the formulas must be replaced by their values to freeze the contents.
    While the column C is still highlighted, press  Ctrl  +  C  to copy all selected cells to the clipboard.
  6. Press  ⇧ Shift  +  Ctrl  +  V  to open the Paste Special dialog box.
    In the Selection area, enable only the Numbers command; disable the other Selection commands. Click  OK .
  7. Select the whole data range including the new column C and sort the range by column C.
    Choose Data → Sort.
  8. Select all rows which have a value 1 in column C, then press  Del  key.
  9. Optional steps: Delete column C. Select the remaining rows and sort them again by column A.

Tip: This method can also be used to remove duplicate text lines from a Writer document. Copy the text to a Calc spreadsheet. Remove the duplicates, then copy and paste back as "unformatted text" to Writer.

Content on this page is licensed under the Public Documentation License (PDL).
Personal tools